Customer support and service is one of the hottest areas for voice AI right now. But building a product that sounds human and responds without noticeable lag has proven much more difficult in some markets than others. Also, most of the major companies were not created with Africa or the Middle East in mind.
AethexAI, a startup founded last year to fill this gap, has raised $3 million in pre-seed funding led by 4DX Ventures with participation from Enza Capital, Dorm Room Fund, Mojo Ventures, and Stanford GSB 26 Fund. Private investors include Stanford University faculty, telecom executives, and AI researchers at Anthropic.
Rather than using existing orchestration tools like Vapi or LiveKit, the company built its own small model and orchestration layer from scratch to handle localized dialects of English, French, and Arabic spoken across its target market. This decision was driven by the unique demands of operating a business in that region, as explained below.
The company is also launching a platform for companies to try out its technology and sign up for its services, as well as an API and SDK for developers to experiment with models.
The startup was founded by Mariama Diallo and Ayooluwa Odemuyiwa. CEO Diallo previously worked at Goldman Sachs before joining YC-backed ModelML as a product and growth talent. CTO Odemuyiwa graduated from Caltech, worked at Meta, and co-founded the company after attending Stanford Business School. The two wanted to build something for an emerging market and began looking for opportunities.
Companies around the world are racing to deploy AI tools to automate parts of their operations. But that doesn’t always work. In Egypt, the founders discovered that a call center had automated the majority of calls, but had to roll back the system due to poor results. Several support centers in Africa said they were having a headache finding and hiring engineers to automate calls at a reasonable cost.
“The latency and jitter we were seeing on automated calls in this region was outrageous. If we had been the orchestrator, we might have had to use a larger model hosted outside the region, resulting in increased latency. We realized that for this to work, we needed to use a very small model and reduce latency at every step,” Odemuyiwa told TechCrunch about the decision to build his own model and orchestration layer.
AI labs deploying the latest models typically spend millions of dollars training the models and acquiring the data. AethexAI has found a solution for both. Rather than pursuing the largest model possible, the company decided that a smaller model was sufficient to address latency issues while maintaining accuracy, and developed its own Kora series with parameters ranging from 300 million to 1.7 billion. This is just a fraction of the size of the LLM, and that’s exactly what matters.
To train these models, the startup used anonymized recordings from its call center partners. We also shipped hard drives to radio stations across Africa to collect more audio data. To keep costs down, we built a network of university student contributors who annotated the data and pronounced local names. As a result, the company says it now handles more than 17,000 calls per day.
On the business side, the company takes care to provide on-site demos and workshops to guide clients new to voice AI through the process and help them identify the best use cases for automation.
“We always tell our customers that we can’t be everything to everyone right now. We’re small, and when we start a conversation with a company, we ask them to pick the one use case that’s most important to them first,” Diallo said.
The startup is open to working in any industry, but for now, the majority of its use cases include debt collection, customer activation, or KYC (Know Your Customer) requests, the standard identity verification process used by banks and carriers. The company employs forward-located engineers on a contract basis to serve local markets and builds channel partnerships with telecom providers to handle calls for voice AI calls. It says plug and play solutions don’t work here at all.
Walter Badoo, co-founder and managing partner of 4DX Ventures, argues that the markets in Africa and the Middle East are fundamentally different from those that most voice AI companies are built to serve.
“Companies in Africa and the Middle East handle around three times the call volume of their Western counterparts, as voice remains the primary channel of customer interaction,” he said. “Existing systems are built for Western markets that feature high-end GPU infrastructure, standard English and European voice environments, and common enterprise workflows in the U.S. and Europe. This creates a significant gap when enterprises need a system that handles dialects, code-switching, unofficial voice patterns, and works with existing telephony infrastructure and within practical price points.”
In other words, companies like Eleven Labs, DeepGram, Sierra, and Cognigy are expanding globally at a fast pace, but the markets they are built for and the markets they serve are not necessarily the same. Startups like AethexAI are betting that gaps such as regional dialect-specific models, on-the-ground partnerships, and locally built infrastructure represent market openings that giants have neither the incentive nor the architecture to fill.
If you buy through links in our articles, we may earn a small commission. This does not affect editorial independence.
