Andrew Bonwick
Vice President of Product Development at Relm Insurance
Madhav Sheth
CEO of Ai+ Smartphone
Stephen Rose
CEO Render Networks


Bengaluru-based Sarvam AI has launched a new large language model (LLM), Sarvam-1. This 2-billion-parameter model is optimised to support ten major Indian languages alongside English, including Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, and Telugu, the official release said. The model addresses the technological gap faced by billions of speakers of Indic languages, which have largely been underserved by existing large language models (LLMs).
Also Read: Mistral AI Unveils New Models for On-Device AI Computing
Key Features and Performance Enhancements
Sarvam-1 was built from the ground up to improve two critical areas: token efficiency and data quality. According to the company, traditional multilingual models exhibit high token fertility (the number of tokens needed per word) for Indic scripts, often requiring 4-8 tokens per word compared to 1.4 for English. In contrast, Sarvam-1’s tokeniser achieves improved efficiency, with token fertility rates of just 1.4-2.1 across all supported languages.
Sarvam-2T Corpus
A significant challenge in developing effective language models for Indian languages has been the lack of high-quality training data. “While web-crawled Indic language data exists, it often lacks depth and quality,” Sarvam AI noted.
To address this, the team created Sarvam-2T, a training corpus consisting of approximately 2 trillion tokens, evenly distributed across the ten languages, with Hindi making up about 20 percent of the data. Using advanced synthetic-data-generation techniques, the company has developed a high-quality corpus specifically for these Indic languages.
“The Sarvam 1 model is the first example of an LLM trained from scratch with data, research, and compute being fully in India”, said Pratyush Kumar, Co-Founder, Sarvam. He added; “We expect it to power a range of use cases including voice and messaging agents. This is the beginning of our mission to build full stack sovereign AI. We are deeply excited to be working together with Nvidia towards this mission.”