The Incommensurability of Digitization and Language Justice

By Jennifer Lee

This post was developed as part of the Columbia University course “Multilingual Technologies and Language Diversity” taught by Smaranda Muresan, PhD and Isabelle Zaugg, PhD.  This cross-disciplinary course offering was a joint effort between the Institute for Comparative Literature and Society and the Department of Computer Science, developed through the generous support of the Collaboratory@Columbia.

In “Digital Language Death,” András Kornai points out that 95% of today’s languages will never make it to digitization—and this is a generous estimate (Kornai 2). Yet digitization is not the first technological medium that left behind vast swaths of the world’s languages; Lydia Liu notes in “Scripts in Motion: Writing as Imperial Technology” that writing itself has not always been equally accessible to the world’s vast array of languages and that, throughout history, the expansion of empire has been intertwined with a need for widely disseminable languages and the prioritization of certain writing systems over others. Thus, the emergence of inequities in digital writing systems is not a new or unprecedented phenomenon; rather, the privileging of certain (written, codified, and especially Romanized) languages over others is a contemporary manifestation of longstanding inequalities. Specifically, Liu claims that “digital media are transforming the world by turning one of the oldest technologies of world civilization—alphabetic writing—into a new imperial coding system” (382).

Given that technology and the expansion of the digital world has not merely created a new problem but rather exposed and amplified already-existing problems within our world, it is easy to imagine that technology could be leveraged to further transform the role of script and language in new ways. While only about 200 languages may be truly written (McWhorter 63), countless others are spoken and signed in peoples’ day-to-day lives. Through the advancement of computational infrastructures, increasingly large quantities of not only textual but also visual and audio communication have the potential to be transmitted and stored as well, potentially enabling the creation of new spaces for the archiving and exchange of oral and signed languages. Yet due to the current prioritization of digital support for only a small portion of the world’s languages, it seems unlikely that widespread multilingual technological support for oral and signed languages will happen anytime soon—Kornai estimates that of the 120+ signed languages currently in use, most are unlikely to cross the digital divide. At the same time, because the internet has lowered barriers to accessing, storing, and sharing large quantities of information, it seems possible that the dedicated efforts of a concerned few might yet be able to swim against the tide of information in the digital age.

Similarly, advances in technology allow us access to a wider range of tools for studying vast amounts of information and languages that we might have known little about otherwise. In “Neural Decipherment via Minimum-Cost Flow: From Ugaritic to Linear B.,” Luo et al. describe using machine learning to decipher extinct languages through comparing them to their known, modern forms. This is an instance where technology is being used not necessarily to replace what humans do (such as developing machine translation tools to replace need for human translators), but rather to augment the work of human beings who are trying to understand these extinct languages (Luo,  Jiaming, et al.), revealing some of the most hopeful possibilities for how digital technology might be used not only to facilitate multilingual communication but also to better understand communication in marginalized languages as well.

Just as technology has allowed for the scaling of the problems accompanying the privileging of certain languages and written scripts at the expense of others, I hope that technology might allow us to scale solutions as well. Language-learning technology that can be accessed by large numbers of people could allow for more widespread access to the tools to learn endangered languages, if people cared enough to make them; similarly, new means of communication across large distances creates potential for the creation of communities that might otherwise have no means to stay in touch. What needs to be done, moving forward, is the creation of tools for these people to communicate in their own languages, so they might in fact use their languages more, not less, than they would have if these tools did not exist at all and they had no means of staying in touch.

Yet rather than simply pressing forward with this call to action, I would like to problematize the easy, technologist’s notion that to digitize more languages inherently serves the goal of making the digital sphere more just. Creating more effective, accessible, multilingual technologies does not in and of itself address the problem of linguistic inequity, when knowledge of indigenous and endangered languages can be weaponized against the communities that speak them just as easily as it can be used to empower them. The digital sphere is only one manifestation of this conflict, and as such, the needs and lives of endangered language speakers must be considered as being of paramount importance in understanding how technology might be transformed as a site of language revitalization and digital ascent.

After all, one of the greatest harms of linguistic inequity—more than the quantifiable figures that much of the call for multilingual technologies has focused on when counting the number of languages lost, speakers remaining, or forgotten scripts—is the psychological toll borne by speakers of marginalized languages living in a world where the lives lived in their languages are rendered inconsequential to majority-language speakers. The true battleground of language revitalization is less that of technological infrastructure than it is peoples’ attitudes towards endangered and minority languages. In “The importance of interlinguistic similarity and stable bilingualism when two languages compete,” Mira et al. focus on the importance of bilingual internet users in creating the possibility for the survival of endangered languages, while Holly Young in “The Digital Language Divide” suggests that bilingual users of technology are crucial to bridging online linguistic divides. Kornai is particularly pessimistic about the role of bilingual users, suggesting that “the chances of digital survival for those languages that participate in widespread bilingualism with a thriving alternative… are rather slim” (9). The problem for Kornai is that given the option, bilingual users will choose the majority language, illuminating for us not only the shortcoming of the technology itself but the shortcomings of a world that pressures its inhabitants to speak in a majority language. What needs to change is not nearly as technological as it is primarily cultural and political change.

It is here that I consider the place of the heritage English-language speaker, who has the privilege of being not only native to the largest lingua franca of the world, but also the power of knowing the language upon which coding languages were developed. Such programmers have a responsibility to consider how to respond to the needs and desires of people who were not born into this language. In many ways, this privilege means we have an outsized ability—and thus a responsibility, as well—to work with people outside these communities to develop technologies that center marginalized and endangered languages. And here, we bear an imperative to see ourselves not only as people who might help, but as people who might learn from endangered languages the narrow precariousness and limits of the digital sphere as well.

Google states that its company’s mission is to “organize the world’s information and make it universally accessible and useful” (“Our Approach to Search”). But do we really want that for all the information in the world? In 2019, Google won a landmark “right to be forgotten” case in the EU, regarding its own right to refuse to de-link data from its search engine (Kelion). This case points towards a lack of protection of the individual’s right to be forgotten by Google, only one among many companies and institutions that engage in the mass surveillance of our digital lives. Yet Google most definitely has yet to organize all the world’s information: of the more than 6,000 languages in use today, only 149—less than 3%—are Google searchable.

Sometimes, when I am writing on Gmail or a word-processor and it marks my text with a squiggly underline indicating that I may have wanted to phrase something differently, I am grateful. But other times, I look at it, reread it, and know that what it is asking me to do is erase my distinctive voice, because what I said does not fit into how its algorithms expect me to write, and I find myself reflecting on how technology in the age of autocomplete, autoreply, of understanding every aspect of the human writing process has in some ways made that act of writing itself feel redundant as these tools push us to conform towards a mean of language production. In this world, I think the most creative processes, the most unbound by the page or the keyboard, might be the ones that exist beyond what is imaginable to algorithmic prediction or computational deciphering. What a blessing it is, then, to be able to envision the potentiality of languages beyond digital ascent.

Works Cited

Kelion, Leo. “Google wins landmark right to be forgotten case.” BBC News. 24 Sep. 2019. Accessed March 3, 2020.

Liu, Lydia H. “Scripts in Motion: Writing as Imperial Technology, Past and Present.” Pmla, vol. 130, no. 2, 2015, pp. 375–383., doi:10.1632/pmla.2015.130.2.375.

Luo, Jiaming, Yuan Cao, Regina Barzilay. “Neural Decipherment via Minimum-Cost Flow: From Ugaritic to Linear B.” Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, July 2019. Accessed Aug 5, 2020.

Mira, Jorge, Luís F. Seoane, Juan J. Nieto. “The importance of interlinguistic similarity and stable bilingualism when two languages compete.” The New Journal of Physics, vol. 13, March 2011. Accessed Aug 5, 2020.

McWhorter, John. “The Cosmopolitan Tongue.” World Affairs, vol. 172, no. 2, Jan. 2009, pp. 61–68., doi:10.3200/wafs.172.2.61-68.

Kornai, András. “Digital Language Death.” PLoS ONE, vol. 8, no. 10, 2013, doi:10.1371/journal.pone.0077056.

“Our Approach to Search,” Google Search, Google, Accessed March 2, 2020.

Young, Holly. “The Digital Language Divide.” The Guardian, Guardian News and Media, Accessed March 2, 2020.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s