GSMA and Pleias Launch AI Tool to Improve African Language Recognition

GSMA and Pleias have introduced a new artificial intelligence tool designed to better recognise African languages, in a bid to close a long-standing gap in global AI systems.

The tool, called CommonLingua, is an open-source language identification model. It has been developed under the GSMA’s “AI Language Models in Africa, by Africa, for Africa” initiative, which brings together partners working to improve how African languages are represented in technology.

Africa has more than 2,000 living languages, yet many of them are missing from the data used to train AI systems. This has led to poor performance when machines try to understand or classify African-language content. In many cases, existing systems wrongly label African languages as English or French, especially when texts mix multiple languages.

Before building tools such as chatbots or translation systems in languages like Swahili, Yoruba, or Wolof, AI models must first identify the language correctly. This is where many current systems fall short.

CommonLingua aims to solve this problem. On a new testing standard known as the CommonLID benchmark, the model achieved 83 per cent accuracy, outperforming other leading systems by more than 10 percentage points. It also uses far fewer computing resources, making it cheaper and easier to deploy.

The model supports 334 languages in total, including 61 African languages from several language families such as Bantu, Niger-Congo, Afro-Asiatic, and Nilo-Saharan. It is designed to work across different writing systems, including Latin, Arabic, Ethiopic, N’Ko, and Tifinagh.

Unlike many existing tools, CommonLingua processes text directly without relying on language-specific rules. This allows it to handle a wide range of scripts more consistently.

Pierre-Carl Langlais, co-founder and chief technology officer at Pleias, said African languages should not be treated as a niche case. He explained that accurate language identification is the first step in building better AI systems for the continent.

The model was trained using open and publicly available data sources, including Wikipedia, scientific publications, and African language datasets such as VOA Africa and WaxalNLP. All data used is released under licences that allow broad access and reuse.

Louis Powell, director of AI initiatives at GSMA, said the lack of basic tools like language identification has slowed progress in African AI. He noted that CommonLingua could help developers build stronger datasets and more inclusive AI systems.

The discussion around improving African-language AI will continue at the GSMA’s MWC26 Kigali event, where industry leaders are expected to explore ways to speed up progress in this area.

DON’T MISS AN UPDATE

Be the first to know when we publish something new

We don’t spam! Read our privacy policy for more info.

Check your inbox or spam folder to confirm your subscription.

Get in Touch

LEAVE A REPLY

Please enter your comment!
Please enter your name here