New Delhi: India’s tech landscape is evolving with Sarvam AI, a startup from Bengaluru that introduces AI models tailored for Indian users. Known as Sarvam Audio and Sarvam Vision, these models aim to take on giants like Google while delivering unique benefits in an Indian context.
India thrives on voice communication. Many people, including farmers and delivery workers, often prefer talking over typing. This inspired Sarvam AI to create systems that excel in handling speech. Their Sarvam Audio model can understand 22 Indian languages, even when speakers switch languages mid-sentence—a common practice in everyday conversations.
Sarvam AI isn’t just about words; it’s backed by solid results. On the IndicVoices benchmark, Sarvam Audio surpassed Google’s Gemini-3-Flash and OpenAI’s GPT-4o in transcription accuracy. It maintained a lower Word Error Rate (WER), which indicates fewer mistakes in turning speech into text.
Sarvam Vision has also made waves. Achieving 84.3 percent accuracy on the olmOCR-Bench, it outperformed Gemini 3 Pro and DeepSeek. Its document analysis scored 93.28 percent on the OmniDoc benchmark, showcasing how smaller, focused models can excel against larger competitors when tailored to Indian documents.
A standout feature is Sarvam’s Speech-to-Command capability, allowing direct action from voice inputs. Unlike typical systems that convert speech to text first, Sarvam Audio can execute commands straight from voice. For instance, saying “Nau” in Hindi is correctly interpreted as “9,” avoiding errors that other models might introduce.
Addressing the realities of Indian work environments, Sarvam Audio uses advanced speaker diarization, identifying up to eight speakers from a single audio stream. This is crucial in busy call centers where multiple conversations occur simultaneously. Moreover, it’s designed for 8kHz telephony, making it effective in scenarios with lower audio quality, typical in customer service contexts.
Supported by the IndiaAI Mission and government resources, Sarvam AI’s goal is to provide sovereign AI solutions for India. By developing models locally, the company aims to reduce dependency on foreign technologies, empowering India to shape its digital future.
With Sarvam Audio and Sarvam Vision, the startup is not just an alternative to established tech giants; it’s carving a niche by prioritizing the needs of Indian users. This shift places Sarvam AI as a frontrunner in harnessing AI tailored for everyday life in India, with the potential to reach billions of users.
Source link
Sarvam AI, sovereign AI India, Bengaluru AI startup, Sarvam Audio, Sarvam Vision, IndicVoices benchmark, Indian languages AI, speech recognition India, code-mixing AI, OCR benchmark India

