Sanskrit Code Converter

This site is intended for programmers only.
No technical support is given to non-programmers.
.

- KOKO - Code Converter for Sanskrit
- DudenSkt - Sorting Program for Sanskrit


KOKO - Code Converter

In order to make the conversion from one 8-bit encoding to another as easy and as fast as possible, a code converter called KOKO (Kode-Konverter) programmed in fast assembly language formerly used by companies in German printing industry has been made available free of charge by Ulrich Stiehl for the conversion of Sanskrit text files available by GRETIL etc.

- The English-language manual of KOKO is downloadable as KokoEngl.pdf.
- The German-language manual of KOKO is downloadable as KokoDeut.pdf.

The assembler program KOKO including ready-to-run batchfiles is available as KOKO.zip (67 kilobyte only!)

Following KOKO conversion tables are included for converting various transliteration encodings to Palladio SKT:

a) Conversion from various 8-bit encodings to URW Palladio SKT encoding:

     1. CSXG-SKT.TAB (Conversion from CSXG encoding to Palladio SKT) - GRETIL CSX text files
     2. REEG-SKT.TAB (Conversion from REEG encoding to Palladio SKT) - GRETIL REE text files
     3. GGM-SKT.TAB (Conversion from GGM encoding to Palladio SKT) - Grantha Mandira text files
     4. US-SKT.TAB (Conversion from US encoding to Palladio SKT) - Searchable US-encoded files
     5. IT-SKT.TAB (Conversion from IT encoding to Palladio SKT) - Itranslator transliteration files

b) Conversion from Palladio SKT encoding to ITX encoding for Itranslator:

     6. SKT-ITX.TAB (Conversion from Palladio SKT to ITX encoding)


1. Installation of KOKO

Windows programs are extremely big and extremely slow, while DOS assembler language programs are extremely small and extremely fast. Therefore the search-and-replace routines of KOKO are about one hundred times faster than those contained in Windows word processing programs. In enable fully automatic conversions, batchfiles (startable with mouse a click) are supplied with KOKO. For proper running these batchfiles, the user must create exactly the following directories:

c:\koko
c:\koko\in
c:\koko\out
c:\koko\temp  (see below section 5)

The packed file KOKO.zip must be unzipped into the directory c:\koko that will contain the following files:

KOKOX.EXE
CSXG-SKT.BAT + CSXG-SKT.TAB
REEG-SKT.BAT + REEG-SKT.TAB
GGM-SKT.BAT + GGM-SKT.TAB
US-SKT.BAT + US-SKT.TAB
IT-SKT.BAT + IT-SKT.TAB
SKT-ITX.BAT + SKT-ITX.TAB

For those who want to develop their own file conversion tables, the development tables ASC-256.TAB and BIN-256.TAB as well as the debugging tables ASC-STAT.TAB and BIN-STAT.TAB for examining files with unknown encodings are also contained in KOKO.zip. For further details on developing conversion tables see the English or German KOKO manual.


2. Conversion to URW Palladio SKT encoding via batchfiles

Example: Conversion of REE-encoded GRETIL files to Palladio SKT files:

1. Copy REEG-encoded TEXTFILES from GRETIL into c:\koko\in
2. With the mouse double-click on c:\koko\REEG-SKT.BAT
3. After a few seconds, all files are converted to c:\koko\out
4. Import these files from c:\koko\out into your word processor
5. Mark up entire text by URW Palladio SKT. That's it!


3. Conversion to ITX via batchfiles

In order to convert Classical Sanskrit encoded in Palladio SKT to Sanskrit encoded in native Devanagari (Sanskrit 99 font), the batchfile SKT-ITX.BAT generating ITX files (Indian Text Exchange files) via KOKO is provided. Do the following:

1. Copy Palladio SKT encoded TEXTFILES into c:\koko\in
2. Start c:\koko\SKT-ITX.BAT with a mouse click
3. After a fews seconds, all files are converted to c:\koko\out
4. Import ITX-encoded files from c:\koko\out into Itranslator
5. Click on Convert to Devanagari. That's it!

Note 1: SKT-ITX.TAB is designed for Classical Sanskrit, not for Vedic Sanskrit.
Note 2: TEXTFILES must be clean ANSI textfiles with Carriage Return Linefeeds.

To avoid unexpected results, the following chart shows, how KOKO converts from SKT to ITX:

Conversion to ITX encoded textfiles

Step 1: As first step, KOKO reduces URW Palladio SKT encoded textfiles via SKT-ITX.TAB to what is marked GREEN in the above chart: Lowercase letters (only those required by Sanskrit) and lowercase diacritics of Classical Sanskrit, figures (0-9), space, Avagraha (') and Danda (|).

- White: These characters are removed altogether (punctuation marks unknown to Devanagari, Polish diacritics, etc.)
- Grey: These characters are removed altogether (Tamil diacritics)
- Yellow: These dieresis characters are removed, but they could be converted by adjusting table SKT-ITX.TAB.
- Blue: These characters, not used in Classical Sanskrit, are removed altogether, but they could be converted by KOKO.
- Red: Candrabindu-m is converted to Anusvara, Anunasika variants are converted to Anusvara + l.
- Magenta: These characters are converted to their respective green characters as follows:
- - Uppercase characters are converted to their respective lowercase characters
- - Characters with intonational and/or stress accents are converted to their respective non-accented characters
- - Liquid characters with underring are converted to their respective characters with underdot
- - Anusvara with overdot is converted to Anusvara with underdot
- - Long/short e/o are converted to their unaccented equivalents
- - Visarga variants (Jihvamuliya/Upadhmaniya) are converted to ordinary Visarga, etc.

Programmers who have carefully studied the KOKO manual and who are acquainted with the ITX encoding scheme used by Itranslator, may adapt SKT-ITX.TAB to their special requirements, e.g. to allow for conversion of Vedic Sanskrit texts.

Step 2: As second step, KOKO converts the remaining green characters to ITX encoding, as used by Itranslator.

Step 3: As third step, the resulting ITX file is opened by Itranslator and converted to Itranslator Devanagari.


4. Conversion from URW Palladio SKT to Itranslator Devanagari

The following example depicts the three steps of conversion from URW Palladio SKT to Itranslator Devanagari:

Conversion to ITX encoding


5. Conversion from URW Palladio IT to ITX Encoding DIRECTLY in ONE Step

1. IT-SKT.BAT converts from IT transliteration to SKT transliteration.
2. SKT-ITX.BAT converts from SKT transliteration to ITX encoding.

These two steps can be easily combined in ONE batchfile using a TEMP directory as follows:

rem Delete old files
c:
cd\koko\out
del *.txt
cd\koko\temp
del *.txt
rem Convert new files
cd\koko\in
for %%f in (*.*) do c:\koko\kokox c:\koko\IT-SKT.TAB c:\koko\in\%%f c:\koko\TEMP\%%f /a
cd\koko\temp
for %%f in (*.*) do c:\koko\kokox c:\koko\SKT-ITX.TAB c:\koko\TEMP\%%f c:\koko\out\%%f /a
To enable no-prompt deletion, it is assumed in the above batch file that all files have the extension ".txt".


DudenSkt - Sorting Program

DudenSkt can be used to sort Sanskrit textfiles (e.g. pada indexes with number references) and for sorting Sanskrit-English and Sanskrit-German glossaries with/without references. For further details please read the documentation dudenskt.pdf.


Sanskritweb is maintained by Ulrich Stiehl, Heidelberg (Germany)