Free/open-source text-processing technologies for Turkic languages

Jonathan N. Washington (Swarthmore College) and Francis Tyers (National Research University «Higher School of Economics»)

This talk describes the application of free/open-source text-processing technologies to Turkic languages, including morphological analysis and generation, machine translation, and spell checking. The need for these technologies is motivated, along with the need for developing them under free/open-source licenses. An overview of how some of the technologies are implemented is provided, with a focus on the successful strategies of computationally encoding linguistic information that comprise the Apertium framework. Which technologies currently exist for which Turkic languages is presented, along with some information on what state each one is in. Also outlined are some outstanding problems in Turkic text-processing, represented by linguistic patterns that are difficult to deal with computationally, either due to difficulty of linguistic analysis or shortcomings of the computational tools employed by these technologies.

We additionally outline how linguistic communities and language experts (especially working together) can develop or improve text-processing tools for their languages.