+merlan #flirora's website

Documenting ŊCv9

  1. Documenting ŊCv9
    1. Dictionary format
      1. Proposed format
      2. Data-driven inflections?
    2. Tools for documenting languages from Markdown
      1. CreateGlossFilter.rb

Most of my earlier conlangs have been documented using LaTeX. In fact, I’d imagine many conlangers feel attracted to it because it excels at typesetting printed documents to a degree that few tools of reasonable cost do. However, it has some downsides:

(note the spurious newline between “the” and “perfective”), and searching is similarly broken.

Dictionary format

For a long time, my conlang documentations have used an ad-hoc text-based dictionary format, which looks like this:

# relten
: nc
@l riltes

# cfiþar
: nc
leaf, page
\textsf{anten cfjoþes} (lit.~\emph{on the leaves of time}) sometimes

# crîþ
: nt
@s clîþic

# flarþ
: nc
@s flalþic

# trešil
: nc
park, garden, field
\textsf{sividin trešil} lit.~\emph{coward's garden} refuge, sanctuary (usually with a negative connotation)

These are converted using a Raku script into LaTeX code, which is included in the main grammar. Of course, it has some downsides that I’d like to fix in an improved format:

Proposed format

Have a metadata section, then a data section after it.

Metadata should include:

Data format:

# headword
: part of speech
< etymology
@tag1 first tag
@tag2 second tag
@tag3 third tag; note: only one value allowed per tag

This can include multiple lines and should support some *basic* markup.

For examples or idioms, as shown below, the phrase in the target language comes first, separated from the translation by an equals sign. An optional explanation can be provided afterwards, separated by a pipe.

% headword sucks = example | explanation

Data-driven inflections?

TL;DR: Probably not going to work well.


Let’s walk through how we can decline V-nouns in ŊCv7.

First, we create a paradigm for V-nouns and set its criteria1:

# <external definition>
paradigm "V-noun" {
  criterion word ~ "((?:\*?(?:#|+\*?|@))?)(.*)(j?[aeiouâêîô])";
  criterion pos ~ "n.*";
  # <rest of paradigm definition>

Then we define the necessary components:

# <rest of paradigm definition>:
component M = word$0;
component N = word$1;
component V0 = word$2;
component V1 = thematic_vowel_derivative_v_1[V0];
component V2 = thematic_vowel_derivative_v_2[V0];
component V3 = thematic_vowel_derivative_v_3[V0];
# <rest of paradigm definition>

Of course, we need to define the appropriate tables2:

# <external definition>:
table thematic_vowel_derivative_v_1 {
  a o
  e o
  i jo
  o o
  u u
  ja jo
  je jo
  jo jo
  â ô
  ê ô
  î jô
  ô ô
  jâ jô
  jê jô
  jô jô
# and so on...

Then how do we get components for L and S forms? We have to get these forms from the entry if it lists irregular forms, but derive it regularly otherwise.

# <rest of paradigm definition>:
if criterion l: tag(l) ~ "((?:\*?(?:#|+\*?|@))?)(.*)(j?[aeiouâêîô])s" {
  L = l$1;
} else if criterion nm: N ~ "(.*)(j?[aeiouâêîô])([^aeiouâêîô]*)" {
  L = nm$0 ~ thematic_vowel_derivative_v_1[nm$1] ~ nm$2;
} else error "N form is corrupted"
# L is defined in this scope because it was defined in both of the
# branches that did not error
if criterion s: tag(s) ~ "((?:\*?(?:#|+\*?|@))?)(.*)ic" {
  S = s$1;
} else if criterion nm: N ~ "(.*)(j?[aeiouâêîô])([^aeiouâêîô]*)" {
  if criterion nm$2 ~ "[rl]?þ" {
    SB = "ð";
  } else if criterion nm$2 ~ "t|st|s" {
    SB = "d";
  } else {
    SB1 = replace(nm$2, "r", "R");
    SB2 = replace(SB1, "([aoâô])R([^aeiouâêîô])", "\\1r\\2");
    SB = replace(SB2, "R", "l");
  S = nm$1 ~ SB;
} else error "N form is corrupted"
# <rest of paradigm definition>

Whoops! We forgot about eclipsis:

# <rest of paradigm definition>:
NE = magic_eclipsis_function(N); # use your imagination
# <rest of paradigm definition>

Finally, we define a table:

# <rest of paradigm definition>:
table {
  rows {
    "nominative" "accusative" "dative" "genitive"
    "locative-temporal" "ablative" "allative" "prolative"
    "instrumental-comitative" "abessive" "semblative I" "semblative II"
  columns {
    "singular" "dual" "plural"
  entries {
    "{M}{N}{0}" "{M}{N}{0}c" "{M}{N}{1}"
    "{M}{N}{0}n" "{M}{N}{0}ŋ" "{M}{N}{1}n"
    "{M}{N}{0}s" "{M}{N}{0}ci" "{M}{N}{1}s"
    "{M}{N}{2}n" "{M}{NE}{2}c" "{M}{NE}{3}n"
    # and so on...

Of course, something resembling subroutines would make our lives easier. By this time, we might be better off using a proper programming language, especially when it comes to monosyllabic noun declensions.

Tools for documenting languages from Markdown

Can be found in the source tree for the site’s repository.


Works on HTML sources, looking for <ol> tags with class ilgloss. Like Leipzig.js, but transforms HTML at compile-time instead of making the browser run JS code.

Kramdown source:

{: .ilgloss}
1. ! šin-on men-at ŋ\geð-i-þ.
2. @ all-%acc.%sg see-%inf %pfv\fail_to-3%pl-%past
3. "They failed to see anything." **Stay mad, `sed`-users!**

{: .ilgloss}
1. ! šin-o nem-an racr-a.
2. @ all-%nom.%sg any-%acc.%sg know-3%sg
3. $\forall x \exists y: \text{$x$ knows $y$}$


“They failed to see anything.” Stay mad, sed-users!
$\forall x \exists y: \text{$x$ knows $y$}$

(Sorry, but this might look weird if you’re using a text browser or a screen reader.)

  1. The syntax here is tentative. 

  2. Of course, we could consider the possibility of supporting the ability to define multiple related tables in tandem.