Return to Lexicography and Australian languages

XSL for interlinear texts

James McElvenny

Interlinearised XML texts can be easily converted into HTML with XSLT stylesheets for display and printing. The links between morphemes and their interlinearised glosses can be retained by lining up the two in the output file with HTML tables. I used this technique to produce person-readable versions of a few texts recorded from La Ode Muhammad Syarif, a speaker of the Indonesian language Muna. marbles.xml is an original XML-encoded text in Muna in which Syarif describes how boys in Muna play marbles, transcription_stylesheet.xml is an XSLT stylesheet for converting this file and any files that follow the same DTD into HTML, and marbles.html is an output file generated from marbles.xml using transcription_stylesheet.xml.

The stylesheet is fairly straightforward. Its only notable feature is the use of HTML tables to keep morphemes lined up with their glosses in the HTML output. The XML source file follows the LACITO archive DTD ( Documents of this type mark forms and their glosses with the tags <form> and <transl>. These tags can be embedded under a number of parent tags in the document, including under tags that mark word-level constituents <w> and tags that mark morpheme-level constituents <m>. For example, a morpheme-by-morpheme gloss would be marked up in the following way:


These morphemes and their glosses are extracted and put into an HTML table by the XSLT stylesheet with the following code:

<xsl:template match="m">
<td><xsl:value-of select="form"/><br/>
<xsl:value-of select="transl"/></td>

(Note: the table itself is defined in the <xsl:template> block for the parent tag of <w>, which is <s>.)

The code above puts each morpheme and its gloss in one cell of the table and separates the two with a line break. This creates the effect of the morphemes being on one line with their glosses on another line underneath. An example of the output and its associated HTML code is shown below.


The HTML code behind this looks like the following:

<tc> <td>do-<br>SUB_IMP</td> </tc> <tc> <td>kona<br>call</td> </tc> <tc> <td>-e<br>OBJ_3_S</td> </tc>

The technique described above can be applied to an interlinearised XML text of any type as long as the XML is structured in such a way that there is a clear link between each morpheme and its gloss.

It should be noted that the text contained in the example files above is copyright La Ode Muhammad Syarif.

© James McElvenny (jamesmce followed by the at sign then stanford dot edu) 20050718

Return to Lexicography and Australian languages

© 2005
Date created: 17 July 2005
Last modified: 22 August 2007