How to write a padma conversion file?

Submitted by gopal on Wed, 24/12/2008 - 15:18
What is Padma?

Simply put, it’s a text transformation utility, which comes as firefox extension. It transforms a custom font encoded text(text written using a custom proprietary font such as eenadu which doesn’t adhere to standards) to Unicode(A standard employed to support all the scripts in this world), provided it is given the mapping file required. There are already around 70 such conversion/mapping files associated with Padma. What that means is, it can transform a text encoded in 70 different ways to Unicode.

This tutorial gets you up with all that is required to kick start writing a new conversion/mapping file in Padma.

So, this mapping file has mappings which map some hexcode(a hexadecimal code associated to some letter, for example letter “ki” can be \u0045) in the font to it’s respective counter part in the standards. Let’s consider eenadu.ttf for example. We want to transform text written using this font to Unicode. First we need to set up our development environment. This need not be done each time you write a new conversion/mapping file. Just once!

Setting up development environment
  1. Install Padma, Forget not to restart firefox.
  2. Run the following commands. Replace <whatever> with an appropriate string in all the steps of this tutorial.

cd ~/.mozilla/firefox/<whatever>.default/extensions/{3e*/chrome

unzip padma.jar

vim ~/.mozilla/firefox/<whatever>.default/extensions/{3e*/chrome.manifest

  1. Replace the line content padma jar:chrome/padma.jar!/content/ with content padma chrome/content/
  2. Save the file.
  3. Install fontforge, a tool to open font files. In fedora, just run yum –y install fontforge and in ubuntu, run sudo apt-get install fontforge
Writing a new mapping/conversion file

The setup is complete! To write a new conversion/mapping file, you need to do the following, remember, each time you intend to write a new one! Just to give an overview of what’s going on, we’ll be doing these things…

  1. Create a new mapping file with the help of already existing mapping file in the same language.
  2. Update padma’s config files with this new mapping file(filename and class we implement in that file)
  3. Write the mappings
  4. Test the new mapping file
  5. Repeat step 4 until you are satisfied with results!

I’ll use eenadu/telugu for font/language in our running example.

Step 1 : Create a new mapping file
  1. Go to  ~/.mozilla/firefox/<whatever>.default/extensions/{3e*/chrome/content/encodings/Telugu/
  2. Make a copy of existing converter with the name previously specified in padma.xul, which is Eenadu.js in our example. (remember that? <script type="application/x-javascript" src="encodings/Telugu/Eenadu.js"/>). I have copied ShreeTel0900.js to Eenadu.js
  3. Open our newly created file, Eenadu.js
  4. Replace Shree_Tel_0900 with Eenadu. This is the class name we have specified in Transformer.js
  5. Change fontFace and displayName to appropriate names, both “Eenadu” in our example.
Step 2 : Update padma’s config files
  1. Open ~/.mozilla/firefox/<whatever>.default/extensions/{3e*/chrome/content/padma.xul
  2. You will see few lines like the following

<script type="application/x-javascript" src="encodings/Telugu/TeluguLipi.js"/>
<script type="application/x-javascript" src="encodings/Telugu/TCSMith.js"/>

<script type="application/x-javascript" src="encodings/Telugu/TeluguFont.js"/>

<script type="application/x-javascript" src="encodings/Telugu/SuriTln.js"/>

  1. These lines tell padma, the path to the mapping files. So, just append a line with same syntax and change the filename appropriately. So after doing this it should look something like the following

<script type="application/x-javascript" src="encodings/Telugu/TeluguLipi.js"/>
<script type="application/x-javascript" src="encodings/Telugu/TCSMith.js"/>

<script type="application/x-javascript" src="encodings/Telugu/TeluguFont.js"/>

<script type="application/x-javascript" src="encodings/Telugu/SuriTln.js"/>

<script type="application/x-javascript" src="encodings/Telugu/Eenadu.js"/>

  1. Save the file. Do the same for padmaMailOverlay.xul
  2. Open ~/.mozilla/firefox/<whatever>.default/extensions/{3e*/chrome/content/transformers/Transformer.js
  3. You’ll find few lines like the following

Transformer.dynFont_AAADurgax    = 63;
Transformer.dynFont_AAADurgaxx   = 64;

Transformer.dynFont_Amudham      = 65;

Transformer.dynFont_ShreeDev0714 = 66;

Transformer.dynFont_Unknown      = 67;

  1. Locate the line with Transformer.dynFont_Unknown. Note the number it is assigned. Just before this line include another line with appropriate name in the place of Unknown and assign it the number previously noted. And increment the former(Transformer.dynFont_Unknown) by one. After doing that, it looks something like the following.

Transformer.dynFont_AAADurgax    = 63;
Transformer.dynFont_AAADurgaxx   = 64;

Transformer.dynFont_Amudham      = 65;

Transformer.dynFont_ShreeDev0714 = 66;

Transformer.dynFont_Eenadu      = 67;

Transformer.dynFont_Unknown      = 68;

  1. In the same file, you’ll find lines which tell padma what classes implement which font-mappings, like…

Transformer.dynFont_Class[Transformer.dynFont_ShreeTel0900] = Shree_Tel_0900;
Transformer.dynFont_Class[Transformer.dynFont_Hemalatha]    = Hemalatha;

Transformer.dynFont_Class[Transformer.dynFont_ShreeTel0902] = Shree_Tel_0902;

Transformer.dynFont_Class[Transformer.dynFont_Tikkana]      = Tikkana;

  1. Include another line with our new class, Eenadu in our example. So, it becomes…

Transformer.dynFont_Class[Transformer.dynFont_ShreeTel0900] = Shree_Tel_0900;
Transformer.dynFont_Class[Transformer.dynFont_Hemalatha]    = Hemalatha;

Transformer.dynFont_Class[Transformer.dynFont_ShreeTel0902] = Shree_Tel_0902;

Transformer.dynFont_Class[Transformer.dynFont_Tikkana]      = Tikkana;

Transformer.dynFont_Class[Transformer.dynFont_Eenadu]       = Eenadu;

Step 3 : Writing the mappings
  1. Get back to our new mapping file, Eenadu.js in our example. Just go through it and you’ll understand that there are various categories of letters, to be mapped, like vowels, consonants and special combinations of them. The combinations are derived bringing together few consonant(s) and vowel(s). They can also be represented by a single code in the font file, as we’ll see.
  2. Run the following command to open our font file, eenadu.ttf in our example.

fontforge /path/to/eenadu.ttf

  1. In the menu of fontforge window, go to Encoding and select compact. And In the View, select 48 pixel outline. This is just to turn the window more readable.
  2. Now, on selecting some random letter, you’ll find some information, in red, about the selected letter, just below the fontforge window menu. I have selected “NII”. It says something like…

39    (0x0027)  U+0046    F   LATIN   CAPITAL LETTER   F

  1. We only need the third column, and also the pronounciation of the letter, in this case “NII”. You can find the pronounciations of Indian language letters  spelled here.
  2. Find the spelling of the letter you have selected, in mapping file you have created, which is Eenadu.js in our example. So, on locating “NII”, I landed at the the line which starts as Eenadu.combo_NII. Now, assign it the code we have observed, U+0046 in our example. So the line becomes…

Eenadu.combo_NII      = "\u0046";

  1. For combinations, for example, take YAA. It is a combination of YA consonant and AA sign. So, the mapping which corresponds to that is…

Eenadu.combo_YAA      = "\u00A7\u00D6";

where 00A7 corresponds to YA consonant and 00D6 corresponds to AA sign

  1. Do this for all the characters that appear in the font file, which was opened in fontforge. That completes the assigning part. Now, In the same file, we should let padma know what mapping in font file corresponds to what in the actual standards.
  2. These kind of mappings can be found towards the end of file. So, we’ll do it with above mentioned two example mappings.

Eenadu.toPadma[Eenadu.combo_NII]     = Padma.consnt_NA + Padma.vowelsn_II;
Eenadu.toPadma[Eenadu.combo_YAA]     = Padma.consnt_YA + Padma.vowelsn_AA;

  1. Do this for all the mappings you have previously made seeing the font file. It’s that simple! It’s just enough to know what are vowels, vowel signs and consonants. And the rest automatically follows from intuition.
Step 4 : Testing
  1. To test the file, collect some encoded data in a some html file. Embrace all that data with a font tag like…

<font face=“Eenadu”> ---->data<---- </font>

  1. Replace Eenadu with whatever fontFace name you have previously specified in the mapping/conversion file.
  2. Open that file in firefox. Right click and select “Transform to unicode”. See if it works.
  3. If there are any errors, you can try locating what letters are causing those errors.
  4. khexedit is a handy tool to show you what hex codes are causing the problem. you can install it in fedora by running yum –y install khexedit and in ubuntu as sudo apt-get install khexedit
  5. Paste the wrongly converted or unconverted text in a file and open it with khexedit. It shows you the hexcodes corresponding to them. You can go back and correct them in the mapping file.

After you are satisfied with the result, you can submit it the file here to be included with next version of Padma! Note that you have to sumbit a new mapping file by reporting it’s absence as a bug.

In case you find any of the above instructions difficult to follow or incomplete, please do let me know. Best wishes!

Do you know how I can update

Do you know how I can update the Unicode character set for Malayalam in Padma. I'm trying to convert Manorama's chillu characters into Unicode chillus. The current mapping is old and converts to a sequence of Unicode characters that are no longer valid.

Thanks in advance!

You can contact the Unicode

You can contact the Unicode standards organization at Proposals and Updates section at http://unicode.org

Excellent...article on padma

Excellent...article on padma conversion file,needless to say anything more.Thanks.

Macken,
Olympic Weight Sets

I am trying to convert

I am trying to convert characters of Telugu unicode font GAUTAMI to Telugu Anu font Priyaanka. I could not be successful.

If unicode can not be converted directly to other fonts , Is there a middle way to convert unicode First to RTS (phonetic) and THEN to Anu. Please let me know.

I will be very greatful if U support.
I am a journalist (one of my my profiles is here : http://users6.jabry.com/vakkalanka/ )
I have written many books ( see cover pages of my Books http://users6.jabry.com/vakkalanka/books)

I have a lot of unicode content to be published in print format.

I will be very greatful if U support.
Waiting for ur early reply.

First I would highly

First I would highly recommend using a diverse set of Telugu unicode fonts that were released this year. They are very good and should be sufficient for most publications. Get them from here: http://teluguvijayam.org/fonts.html

Though I don't advocate re-encoding utf-8 encodede unicode text using a proprietary font, I can understand the urge to publish using more fancier fonts. If you are a programmer, you can use PHP port of Padma and use the system and script a new file to reverse-convert the unicode text.

Sir thank you for

Sir thank you for explanation.I have some problem which may go out of padma custum encoding scope that is PDF's with non unicode telugu encoding Like following.
http://www.eenadu.net/Magzines/Annadata/anna26.pdf

Is it possible to write padma font mapping file to above PDF by extracting embedded fonts just by pdf file itself.text search in telugu is becoming hard in non unicode telugu PDF's.my request you is to help with some youtube illustarations of padma mapping file etc.kind request for more interactive illustrations regarding custom encoding.

(No subject)

Hiya! I simply wish to give an enormous thumbs up for the good info
you

(No subject)

Hello I am so glad I found your blog page, I really found you by error, while I was looking on Askjeeve for something else, Regardless I am here
now and would just like to say thank you for a incredible post and a all round entertaining blog
(I also love the theme/design), I don

(No subject)

Hey! I simply wish to give a huge thumbs up for the nice info you

(No subject)

hey there and thank you for your info

(No subject)

Hello! I just wish to give a huge thumbs up for the great info
you

(No subject)

Right here is the right website for everyone who wishes to understand this topic.
You realize a whole lot its almost tough to argue with you (not that I
really would want to

Great article, totally what I

Great article, totally what I wanted to find.

Hello, i think that i saw you

Hello, i think that i saw you visited my site thus i came
to “return the favor”.I am trying to
find things to enhance my web site!I suppose its ok to use a few of your ideas!!

wonderful piece of

wonderful piece of information, I had come to know about your blog from my friend Nandu , Hyderabad, I have read at least 7 posts of yours by now, and let me tell you, your website gives the best and the most interesting information. This is just the kind of information that I had been looking for, I'm already your RSS reader now and I would regularly watch out for the new posts, once again hats off to you! Thanks a ton once again, Regards, eenadu epaper

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options