

The urgency of language endangerment has also prompted linguists to incorporate computational speech processing tools into the transcription pipeline. We describe an ongoing fruitful collaboration and make recommendations for future partnerships between academic researchers and language community stakeholders.Īs most of the endangered languages only exist in spoken form, it is crucial for linguists to transcribe to preserve records of linguistic events and support language learning. We report the perspectives of language teachers, Master Speakers and elders from indigenous communities, as well as the point of view of academics. In this position paper, we discuss the unique technological, cultural, practical, and ethical challenges that researchers and indigenous speech community members face when working together to develop language technology to support endangered language documentation and revitalization.

While issues stemming from the lack of resources necessary to train models unite this disparate group of languages, many other issues cut across the divide between widely-spoken low-resource languages and endangered languages. As a result, the languages described as low-resource in the literature are as different as Finnish on the one hand, with millions of speakers using it in every imaginable domain, and Seneca, with only a small-handful of fluent speakers using the language primarily in a restricted domain. Causes of resource scarcity vary but can include poor access to technology for developing these resources, a relatively small population of speakers, or a lack of urgency for collecting such resources in bilingual populations where the second language is high-resource. Languages are classified as low-resource when they lack the quantity of data necessary for training statistical and machine learning tools and models.
