Currently, this script expects your computerized lexicon to be the same format as mine. If you would like to use this script, email me and I'll try to add support for your format (assuming it's a consistent format, so that a computer program can easily parse it — if you keep your language information in prose format, I can't help ya ;)).



At the command line, run:

perl ./ < source-file.txt > interlinearized-file.html


See the interlinearized texts in Writing section of Arthaey's website:


You must begin each interlinear source file with a configuration section, which defines the names of the languages used and specifies where the lexicon file and the dictionary HTML page are located. For example:

      L0 = LanguageToBeInterlinearized
      L1 = SmoothTranslationLang
      L2 = OtherSmoothTranslationLang
      dictionary = ../www/dictionary.html
      lexicon = saved-lexicon

The language codes must be L0..L9, and L0 must be the language whose lines are to be interlinearized. You must define dictionary to be the relative path to the HTML version of your dictionary (morphemes will be linked to $dictionary#$morpheme). You must also define lexicon to be the relative path to the FreezeThaw-saved version of your lexicon.

You may optionally include extra words in ``temporary lexicon'' section, before the interlinear text itself. Words defined here will override words in the lexicon defined in the config section (although only for this one text). Use the same format as for your main lexicon (which currently must be SIL Shoebox's format) Proper names are the most likely thing to be defined here. For example:

      \lx Arthei
      \ph 'Ar\Te
      \ps prop
      \ge Arthaey

Interlinear Markup

After the <config> ... </config> section comes the interlinear text. These lines begin with one of the Ln language codes defined in the configuration section, followed by a colon and whitespace, and then the text itself. For the L0 line, you will further mark the text up so that it can be properly broken down into morphemes and automatically glossed.

Place | at the end of each morpheme. To select a morpheme's sense that isn't the first one, append the sense's number directly after the pipe. Thus, bat and bat|1 will gloss to the first meaning of the word bat, and bat|2 will gloss to the second meaning of the word bat. The order of words' senses is determined by order of entry in the lexicon.

Surround with { and } characters that belong in the final orthographic version but that aren't part of the dictionary form of the morpheme. These characters will be displayed in the final version, but will not be used to look up the glosses of morphemes. (Punctuation marks will need to be included in curly braces, for example.)

Add parts of morphemes that have been left out of the final orthographic version with [ and ]. These characters will not be displayed in the final version, but they will be used to look up the glosses of morphemes.

A # will become a newline (HTML <br/>), and two ## together will become a new paragraph tag (HTML <p/>) in the big orthographic version.

To preserve the case of a particular word, prefix it with ^. This is most useful for proper names.

Any HTML (or anything, really) between < and > will be passed verbatim to the big orthographic version of the text, although not to the line-by-line orthograrhic version.

Links to each line's line number are automatically placed at the very beginning of each line. Normally, this is what you want. Sometimes, however, you will want more explicit control over the link's placement: for example, HTML headings will otherwise cause a line break between the link and the line itself. Anywhere a @ appears in a line, it will be replaced by the link to the line number.