NANDA

What is NANDA?

Hindi is spoken by 1 billion people in India alone and is the 4th most spoken language worldwide, however, 85% of Generative Al training data is in English or European languages.

With bilingual capabilities of Hindi, Hinglish, and English, NANDA helps preserve 35 centuries of history, culture and literature and unlocks access to the Hindi speaking world.

Equitable AI access

Empowering India’s scientific, academic, and developer communities by accelerating the growth of a vibrant Hindi language AI ecosystem and ensuring equitable access to AI across the region.

Key benefits

Why NANDA?

Available open source

NANDA is a 10 billion parameter pre-trained and instruction-tuned bilingual large language model for both Hindi and English, trained on a dataset containing 65 billion Hindi tokens. Available open-source on HuggingFace for easy downstream development.

Bringing AI to everyone

Localization has the power to connect, engage, and inspire by bridging language barriers. With seamless handling and reasoning across bilingual content, NANDA enables organizations to unlock access to the Hindi
speaking world.

Fully bilingual

With bilingual capabilities of Hindi, Hinglish, and English and trained on 2.13 Trillion tokens of language data, NANDA delivers the highest performance in Hindi without compromising on English proficiency.

Experienced in bilingual AI

We leverage our Al expertise and in-house experience from building the world’s leading open-source Arabic Large Language Model JAIS, to continue to build models for more underserved languages.