July 15, 2025

AI and indigenous languages

Today’s AI is the most detailed and accessible memory of what it means to be human ever built. It reflects the culture, values, and purpose of our species.

But it is far from a perfect memory. It does not remember equally the details of cultures and languages that have been pushed to the margins of the digital world. I had the opportunity to participate in a revealing study on this phenomenon: “The Performance of Artificial Intelligence in the Use of Indigenous American Languages,” produced by LLYC in collaboration with Microsoft and BID Lab.

The findings are striking: generative AI systems manage apparently correct responses in only 54% of queries in indigenous languages. And when they do respond “correctly,” they do so with texts four times shorter and with an expressive and comprehension quality that barely reaches 2.3 out of 10. The explanation is clear and telling: there is a 91% correlation between the volume of digital content available in a language and the performance AI delivers in it. Without digital presence, there is no learning.

This digital gap is not merely a technological problem — it is an unprecedented opportunity for inclusion. AI can become a powerful amplifier to give visibility to indigenous peoples and cultures, helping to reduce the isolation of communities affected by illiteracy and monolingualism. To get there, we identified 21 strategies focused on both increasing the data available in these languages and developing enabling technologies. Among the most important: promoting digital conversation in indigenous languages, giving visibility to indigenous influencers, protecting archives of traditions, and developing adapted translation technologies.

The path toward a truly inclusive AI requires a collective effort. That is why we propose creating an international consortium bringing together national and international organizations, cultural institutions, and technology companies. This study is a first step in that direction, but we need more allies to ensure that the AI revolution does not leave behind the millions of people whose languages and traditions are a fundamental part of human heritage.

What is at stake goes beyond the technological: the true measure of the artificial intelligence revolution will not be its ability to process majority languages, but its power to preserve and amplify the voices that have been left out of written history for centuries. Because an AI that only speaks the languages of economic power is not intelligent — it is simply the digital echo of the same inequalities we have normalized for far too long.

Related theses