What is it?

lex2xml is a Perl script that converts a Shoebox .lex file into XML. It reads the .lex file from STDIN and outputs the XML to STDOUT.

Note that the current version is written specifically for my conlang Asha'ille. You will have to modify the script in the following places to use it yourself:

  • replace @hierarchy with your own word-categories
  • replace <lexicon> attributes with your own values
  • replace <person> node with your own values
  • replace x-asha with your own ISO language code
  • replace Ashaille.pm with your own language-specific module, defining (and renaming) the following functions:
    • replace kateinu and kateinu_sort with your own conlang-specific romanization schemes

Requirements

  1. Perl
  2. Text::Shoebox and Text::Shoebox::Lexicon modules
  3. CXS module
  4. Lexicon.pm
  5. Ashaille.pm

Download

The current version of lex2xml can be downloaded here (last modified July 25, 2011).

Usage

Run the Perl script like so:

lex2xml < dictionary.lex > dictionary.xml

See lexicon-update.sh for how to use this script to go from Shoebox .lex file to the "pretty" dictionary and thesaurus that I use for Asha'ille.

Examples

Example dictionary.lex input:

\lx caea
\ph 'ke.A
\ps n
\ge world
\sd fe:nature
\et fv:aea
\dc 19/Nov/2002
\dt 16/Feb/2004

\lx jhurla
\ph 'Zur\lA
\ps interj
\ge hello
\xv Vedá aró jhurla vel das.
\xe Good morning, everyone.
\xt 17/Dec/2004
\ue Cannot be used to greet someone you don't know
\dc before 17/Dec/2004
\dt 17/Dec/2004

Example dictionary.xml output:

<?xml version="1.0" encoding="UTF-8" ?>

<lexicon
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http://www.arthaey.com/conlang/lexicon lexicon.xsd"

   lexeme-lang="x-asha"
   document-lang="en"
   src="http://www.arthaey.com/conlang/lexicon/lexicon.xml"
>

<person role="author">
   <name>Arthaey Angosii</name>
   <email>arthaey@gmail.com</email>
   <url>http://www.arthaey.com/</url>
</person>

<entry>
   <lexeme>caea</lexeme>
   <lexeme-sort>CAEA</lexeme-sort>
   <ipa>ˈke.ɑ</ipa>
   <cxs>'ke.A</cxs>
   <kateinu>cæa</kateinu>
   <kateinu-sort>cDB</kateinu-sort>
   <word-class>n</word-class>
   <gloss lang="en">world</gloss>
   <gloss-sort>WORLD</gloss-sort>
   <domain lang="en">nature</domain>
   <domain-path>Nature</domain-path>
   <xref type="etymology">aea</xref>
   <date>2002-11-19</date>
</entry>

<entry>
   <lexeme>jhurla</lexeme>
   <lexeme-sort>JHURLA</lexeme-sort>
   <ipa>ˈʒuɹlɑ</ipa>
   <cxs>'Zur\lA</cxs>
   <kateinu>Jurპla</kateinu>
   <kateinu-sort>TGJRB</kateinu-sort>
   <word-class>interj</word-class>
   <gloss lang="en">hello</gloss>
   <gloss-sort>HELLO</gloss-sort>
   <example>
      <text lang="x-asha">Vedá aró jhurla vel das.</text>
      <text lang="en">Good morning, everyone.</text>
   </example>
   <note type="usage">Cannot be used to greet someone you don't know</note>
   <date>2004-12-17</date>
</entry>

</lexicon>