Time to Read:
Latest JAIS model iteration shows stronger performance across content generation, summarization, Arabic-English translation.
JAIS 30B is the newest and most proficient version of Inception’s open-source Arabic Large Language Model (LLM). Featuring 30 billion parameters, this new iteration of JAIS follows the release in August 2023 of the 13 billion parameter model, underscoring Inception’s commitment to provide a rich linguistic and culture-focused generative AI experience for the over 400 million Arabic speakers worldwide.
JAIS, born from the collaboration between Inception, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), the world’s first graduate research university dedicated to AI, and Cerebras Systems, immediately set a benchmark in the Arabic LLM landscape. The model was trained on the Condor Galaxy-1 (CG-1) – one of the world’s fastest AI supercomputers, with 4 exaFLOPS of training compute, 54 million cores, and 64-nodes – built by G42 in partnership with Cerebras Systems. JAIS 13B went from concept to fine-tuned, leading open-source model in less than four months. Notably, the production training run for JAIS 13B was completed in 21 days on CG-1.
The new JAIS 30B model was trained on a substantially larger dataset than its predecessor, made of 126 billion Arabic tokens, 251 billion English tokens, and 50 billion code tokens and shows an increased performance across all key indicators. It offers 160% longer and more detailed answers in Arabic and a 233% increase in English, reflecting significant improvements in language generation. The model also presents better performance in summarization (53% in Arabic and 85% in English) and formatting (130% in Arabic and 134% in English). JAIS 30B performance is now on par with monolingual English models and outperforms most open-source models in Foundation Model evaluations.
JAIS 30B’s enhancements have been tested and validated using heuristic, cross-model comparison, and human evaluations, showing that the responses of the model’s fine-tuned iterations outperform those of JAIS 13B 96% of the time in Arabic and 97% in English.
Reaffirming its dedication to responsible and safe AI practices, the developing team has also further enhanced its processes and policies to guardrail biases and the production of hateful or harmful content by the model, a process made easier by its open-source release.
JAIS’s versatility and unique capabilities in the Arabic language domain have already shown promise in applications across various sectors including telecommunications, energy, education, healthcare as well as innovative solutions for the marketing communications industry.
JAIS 30B is available for download on Hugging Face.
Hugging Face foundational model: https://huggingface.co/core42/JAIS-30b-v1
Hugging Face chat model: https://huggingface.co/core42/JAIS-30b-chat-v1