By Janill Lema
This post was developed as part of the Columbia University course “Multilingual Technologies and Language Diversity” taught by Smaranda Muresan, PhD and Isabelle Zaugg, PhD. This cross-disciplinary course offering was a joint effort between the Institute for Comparative Literature and Society and the Department of Computer Science, developed through the generous support of the Collaboratory@Columbia.
It is estimated that a language approaches extinction every fourteen days (Young). One main contributor to language extinction is modern technology, even though it connects people together from around the world and it provides fast access to information. Today’s technological advancements encourage multilingual individuals to use dominant languages on communication devices since low resource languages are deprived of basic digital tools such as keyboards or OS support. The extinction of a language not only creates tears within the community in which it is used but it also deprives the world of wisdom and unique ideas. It is essential for society to take steps towards revitalizing low resource and endangered languages through digital resources such as natural language processing tools.
It is estimated that, during the twenty-first century, 50% to 90% of languages will be lost, partly due to unavailable digital resources that support minority languages and scripts (Zaugg 13). Oftentimes, this is because individuals who speak these minority languages do not have the infrastructure or financial resources that enable them to access technology. However, as Isabelle Zaugg argues, “digital design has come to support an increasing number of languages, but this process has been largely market-driven, excluding languages of communities too small or poor to represent viable markets” (15). When a language does not have technological support such as a keyboard design, multilingual communities rely more on dominant languages for communication across the internet. As a result, children grow up learning and using the dominant language rather than the native language that their grandparents speak. Therefore, the knowledge and important stories that come with the language, and the language itself, will be forgotten by future generations.
With the loss of a language comes irreversible losses for humanity. Indigenous communities that speak low-resource or endangered languages have a different understanding of the world and sometimes even unique knowledge that would greatly aid humanity. For instance, Richard Schultes, a biology professor at Harvard who is widely recognized as “the father of modern ethnobotany,” studied the medicinal use of plants by indigenous groups in the rainforests of Brazil. In the 1940’s, while conducting research in the Amazon, he discovered the source of curare, traditionally a dart poison used by indigenous groups, which is today used as muscle relaxant. His students continued his work and “they have written with authority on the ‘ethnobotanical approach to drug discovery,’ which is, in essence, field work guided by shamans and healers” (Thurman). Therefore, the extinction of a language would only lead to the loss of unique cultural knowledge that has been gained through centuries of experimentation and adaptation. As proven by the findings of Richard Schulte, this knowledge may not only be important for the speakers of the low-resource language but also for the rest of the world.
Furthermore, the loss of a language is an issue of human rights. Author Suzanne Romaine states that “the vast majority of today’s threatened languages and cultures are found among socially and politically marginalized and/or subordinated national and ethnic minority groups” (31). The loss of a language is not natural as it results from social, cultural or political limitations (whether directly or indirectly) imposed on the minority community by more dominant communities. This is further supported by the fact that “people do not normally give up their cultures or languages willingly” (Romaine 31). The social implications of the loss of a language for the community that speaks it are poverty, poor health, drug abuse and suicide. Within the 21st century, 50% to 90% of languages will disappear, and it will most definitely be the impoverished communities that suffer the most while the more dominant communities continue to flourish.
Even though technology contributes to the loss of languages, it has also helped revitalize minority languages. An instance of this would be the Kamusi Project, “a multilingual online dictionary website [which] has as one of its goals to document the lexicons of endangered and less-resourced languages (LRLs)” (Benjamin & Radetzky 15). The data recorded can be used by other researchers and engineers to train models in order to create tools such as translation technologies for low-resource communities.
In order to further the process of using technology to revitalize languages, software engineers should be taught about the effects of technology on different societal areas such as language diversity. As a student studying Computer Science at Columbia University, I was taught how to optimize code in addition to being taught so many amazing and innovative technologies such as Artificial Intelligence or Natural Language Processing, but I was never taught the societal implications my lines of code may have. Making future software engineers aware of these issues would help us to keep these issues in mind, and it will even motivate us to create accessible products for all communities.
Languages are disappearing at a rapid pace. The communities that are always severely affected by their disappearance are minority communities. Efforts have been made to revitalize some languages but they are not enough. While doing research for a presentation on Quechua, I could only find about five papers that detailed the usage of natural language processing technologies to revitalize the language. Additionally, there were only a few tools (for example, word embeddings and syntax trees) that could be used to help students and researchers to create products that could be directly used by members of the Quechua Speaking community. This was a surprise to me since it is the largest indigenous language spoken in the Americas. My grandmother is the only person in my family who can speak Quechua. My father grew up hearing his grandmother and mother speak it but was forced to communicate in Spanish by society. Therefore, neither my brothers nor I have ever spoken a word in Quechua. It deeply saddens me that, since it is currently an endangered language, in a few decades it can disappear along with the rich stories and traditions of my ancestors. By placing effort into creating tools and products that are accessible to low-resourced communities, the world will continue to remain rich and culturally diverse.
Benjamin, Martin, and Paula Radetzky. “Small Languages, Big Data: Multilingual Computational Tools and Techniques for the Lexicography of Endangered Languages.” Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages, 2014, doi:10.3115/v1/w14-2203.
Romaine, Suzanne. “The Global Extinction of Languages and Its Consequences for Cultural Diversity.” Cultural and Linguistic Minorities in the Russian Federation and the European Union Multilingual Education, 2015, pp. 31-46, doi:10.1007/978-3-319-10455-3_2.
Thurman, Judith. “Annals of Conservation: A Loss for Words.” The New Yorker, 30 Mar. 2015, https://www.newyorker.com/magazine/2015/03/30/a-loss-for-words.
Young, Holly. “The Digital Language Divide.” The Guardian, Guardian News and Media, labs.theguardian.com/digital-language-divide/.
Zaugg, Isabelle. “Digitizing Ethiopic: Coding for Linguistic Continuity in the Face of Digital Extinction,” 2017, pp. 13-16.