A digital sketch grammar of Yawarana (0.0.4)

This is a corpus-based, data-rich digital grammar1 sketch of Yawarana developed and written by Florian Matter in collaboration with Natalia Cáceres-Arandia and Spike Gildea. It is based on a corpus of texts, collected by Cáceres-Arandia and annotated using uniparser-yawarana.

This is the CLLD version of the grammar, which aims to make accessible all three components of a Boasian trilogy.

Other available formats:

Visit the github repo for bundled releases and the markup source.

The grammar sketch

...is still under construction and therefore bullet-pointy. It is divided into chapters, browsable under description.

The corpus

The annotated texts are browsable under corpus, the results of annotating them under morphosyntax. Example illustrates the current features of the corpus:

The first object line is a link to the entire text record ('sentence', 'example'...). The second line contains links to individual word forms. The third line contains links to individual morphs. The fourth line shows POS tags. The link in parentheses leads to the (con-)text of the record. Audio associated with the record is shown below it. There is both an English (partially auto-translated with deepl) and the original Spanish translation.

Words that uniparser-yawarana was unable to parse are glossed with *** . Words with multiple possible analyses (where none has been confirmed manually yet) are glossed with ?.

The 'dictionary'

The dictionary component of this project is not that sophisticated, since the focus lies on the grammatical description. The Yawarana lexicon is modeled using four kinds of entities: morphemes, morphs, lexemes, and stems, browsable under lexicon. Word forms (in morphosyntax) are forms that occur in the annotated corpus or were uttered in elicitation. They are composed of morphs, which in turn belong to morphemes. Where applicable they contain a stem, which in turn belongs to a lexeme. To illustrate: the wordform wïnïjse 's/he slept' is composed of the morphs wïnïj and -se, which in turn belong to the morphemes wïnïkï ‘sleep’ and -sepst’. The form belongs to the lexeme wïnïkï ‘sleep’.


  1. Work in progress: a guide to create your own