Close Menu
  • Home
  • AI
  • Entertainment
  • Finance
  • Sports
  • Tech
  • USA
  • World
  • Latest News

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

What's Hot

Investment strategist says this is the ‘biggest mistake’ people make when markets are volatile

March 6, 2026

City Detect, which uses AI to help keep cities safe and clean, raises $13M in Series A

March 6, 2026

She traveled to Mexico to reconnect with her roots and find love. What followed was separation, loss, and joy.

March 6, 2026
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram Vimeo
BWE News – USA, World, Tech, AI, Finance, Sports & Entertainment Updates
  • Home
  • AI
  • Entertainment
  • Finance
  • Sports
  • Tech
  • USA
  • World
  • Latest News
BWE News – USA, World, Tech, AI, Finance, Sports & Entertainment Updates
Home » Africa has thousands of languages. Can we train AI for all of them?
Latest News

Africa has thousands of languages. Can we train AI for all of them?

adminBy adminDecember 11, 2025No Comments5 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp VKontakte Email
Share
Facebook Twitter LinkedIn Pinterest Email


How can you teach a language to read if there is nothing to read? This is a problem facing developers across the African continent as they seek to train AI to understand and respond to prompts in local languages.

To train a language model, you need data. For languages ​​like English, developers have articles, books, and manuals easily accessible on the Internet. However, for most of Africa’s languages ​​(estimated to be between 1,500 and 3,000 of them), very few written resources are available. Vukosi Maribate, a computer science professor at the University of Pretoria in South Africa, uses the number of Wikipedia articles available to illustrate the amount of data available. There are over 7 million articles in English. Tigrinya, spoken by about 9 million people in Ethiopia and Eritrea, has 335 words, but Akan, Ghana’s most widely spoken mother tongue, has none.

Of these thousands of languages, only 42 are currently supported by the language model. Of Africa’s 23 scripts and alphabets, only three are available: Latin, Arabic, and Ge’Ez (used in the Horn of Africa). This underdevelopment “comes from a financial perspective,” says Chinasa T. Okoro, founder of TechnēculturĎ, a research institute working to advance global equity in AI. “Even though there are more people who speak Swahili than Finnish, Finland is a better market for companies like Apple and Google.”

Okoro warns that unless more language models are developed, the implications for the entire continent could be dire. “We’re going to continue to see people being excluded from opportunities,” she told CNN. As the continent looks to develop its own AI infrastructure and capabilities, those who do not speak one of these 42 languages ​​risk being left behind.

Chinasa T. Okoro, founder of techno culture.

To avoid this, Okoro says, AI developers across the continent “need to rethink the way they approach model development in the first place.”

This is what Marivate did. Mr Maribate heads the South African arm of the African Next Voices project and has made recordings in 18 languages ​​in South Africa, Kenya and Nigeria. Over two years, the three teams collected 9,000 hours of recordings from people of different ages and locations, creating a dataset that AI developers across the continent can use to train their models.

The researchers sometimes gave native speakers scripts to read, but mostly they gave prompts, recorded their responses, and transcribed them. In the case of Isindebele, a language spoken in South Africa and Zimbabwe, I had great difficulty finding written material, so I relied on government manuals for goat herders to create prompts.

African Next Voices does not collect enough data to train large-scale language models (LLMs) like ChatGPT and Gemini, which can cover thousands of topics in detail. But Maribate said he focused his recordings on specific topics he considered most important, such as health and agriculture.

Using small datasets to create generalized models results in high error rates, while small, focused datasets can have high accuracy within the limited scope of specialized models, explained Nyalleng Moorosi, a researcher at the Distributed AI Research Institute (DAIR) who is not affiliated with the African Next Voices project.

For her, it’s a matter of “prioritizing error.” “If someone just wants to know what’s going on in downtown Nairobi, mistakes there can be tolerated,” Murosi says, but mistakes in models dealing with topics such as banking or health care can have serious consequences.

“We need to make sure that the people building these models understand the culture enough to understand the consequences and understand the weight of these errors,” Murosi told CNN.

Nyalleng Moorosi, Researcher at the Distributed AI Institute.

Words and symbols have multiple meanings, she says. For example, St George’s Cross has associations with right-wing politics in the UK, but this is not obvious to someone from Ghana or Lesotho. This problem is especially noticeable in languages ​​with fewer resources. “There is a lot of contextual knowledge, but very little documentation,” she says.

A DAIR investigation found that social media websites were unable to recognize and remove hate speech related to ethnic violence in Ethiopia, in part because automated systems and human moderators were unfamiliar with the slang terminology used.

Without this cultural understanding, Murosi says, it is impossible for “AI systems to behave and make decisions that are consistent with our beliefs and values.”

While many Africans speak multiple languages, including African and European languages ​​that are already supported by language models, Moorosi believes the goal should be “to make AI accessible in all languages, even those with a single speaker.” All languages ​​deserve expression or preservation.

But lack of data is not the only challenge facing AI developers in Africa. Most African languages ​​have not been codified through dictionaries or grammatical studies. In Kinyarwanda, the language of Rwanda, there are three common ways to spell the country’s name: uRwanda, Urwanda, and uRwanda. Without spelling rules, even the most basic text processing becomes difficult.

Another problem is the lack of data centers. The African Union has warned that only 10% of the continent’s data center needs will be met in 2024, creating a bottleneck for Africa’s AI hopes.

The concern for Marivate is that if models are not created for these small languages, they will “die.” If developers were to create a dataset for a language that might not even have a writing system, “the model would have to change,” he added.

The African Next Voices project has just finished collecting and transcribing data. Maribate said that while he is not currently working on a new language, he is already thinking about which language to develop next.



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp Email
Previous Article2025’s Biggest Wellness Trend: Lymphatic Drainage Tools
Next Article Interest in Spool’s bird monitoring AI software is growing
admin
  • Website

Related Posts

She traveled to Mexico to reconnect with her roots and find love. What followed was separation, loss, and joy.

March 6, 2026

Exclusive: Minutes after disaster struck — how Qatar shot down two Iranian bombers in its first air combat operation

March 6, 2026

Italian ambulance driver suspected of killing five patients

March 6, 2026

Visualize US and Israeli attacks on Iran and retaliation with maps and diagrams

March 6, 2026
Leave A Reply Cancel Reply

Our Picks

Newly freed hostages face long road to recovery after two years in captivity

October 15, 2025

Former Kenyan Prime Minister Raila Odinga dies at 80

October 15, 2025

New NATO member offers to buy more US weapons to Ukraine as Western aid dwindles

October 15, 2025

Russia expands drone targeting on Ukraine’s rail network

October 15, 2025
Don't Miss
Entertainment

Romance with model Kevin Baker

By adminMarch 6, 20260

Daryl Hannah shows up at John’s Loft after his first date with CarolynJohn’s girlfriend, Daryl…

The true story of John F. Kennedy Jr. and Carolyn Bessette’s secret wedding

March 6, 2026

Josh Duhamel’s wife Audra Mari is pregnant with their second child

March 6, 2026

Ballerina Farm’s Hannah Neeleman is pregnant and expecting her 9th child

March 6, 2026
About Us
About Us

Welcome to BWE News – your trusted source for timely, reliable, and insightful news from around the globe.

At BWE News, we believe in keeping our readers informed with facts that matter. Our mission is to deliver clear, unbiased, and up-to-date news so you can stay ahead in an ever-changing world.

Our Picks

She traveled to Mexico to reconnect with her roots and find love. What followed was separation, loss, and joy.

March 6, 2026

Exclusive: Minutes after disaster struck — how Qatar shot down two Iranian bombers in its first air combat operation

March 6, 2026

Italian ambulance driver suspected of killing five patients

March 6, 2026

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Facebook X (Twitter) Instagram Pinterest
  • Home
  • About Us
  • Advertise With Us
  • Contact US
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2026 bwenews. Designed by bwenews.

Type above and press Enter to search. Press Esc to cancel.