JAIS: A BREAKTHROUGH IN ARABIC AI
2024-10-24
2024-10-24
By: Basma Balabel
What is Jais? History and Development
The rise of generative Artificial Intelligence has transformed multiple industries and the introduction of Arabic Language Models (LLM) has marked a significant advancement towards addressing the growing demand for AI tools that cater to Arabic-speaking users. In August 2023, Inception, a G42 company specializing in AI, announced the open-source release of Jais. Jais is a bilingual English-Arabic LLM that was developed by Inception in collaboration with Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) and Cerebras systems.
The 13-billion parameter model was trained on a carefully curated dataset of 116 billion Arabic tokens in addition to 279 billion English tokens. The diverse cross-linguistic dataset was sourced from various domains such as social media, academic papers, news, literature, educational materials, and more. The multilingual design of the dataset was aimed at optimizing cross-language transfer, allowing the model to excel in both languages simultaneously.
In an attempt to further enhance the precision of the model’s output, Inception launched the latest version of Jais in August 2024 which is a 70-billion parameter model. The latest update of Jais comes as an integral part of a strong comprehensive Jais suite of fine-tuned models trained on up to 1.6T tokens of Arabic, English, and code data [4].
What makes Jais stand out among Arabic LLM models?
Jais is by no means the first Arabic LLM and is predated by several initiatives including AraGPT, ArabicBERT, Tashkeela, and more. However, these early attempts fail to match Jais’s unique features that makes its launch a breakthrough in Arabic generative AI.
For instance, the diverse dataset Jais is trained on manages to capture the complexities of the multiple dialects and nuances of the Arabic language – areas where many Arabic LLMs previously struggled. Moreover, the size of the dataset itself allows it to give more contextually rich and informative responses compared to previous Arabic LLM models and other famous LLMs that produce Arabic texts such as ChatGPT. Furthermore, what truly sets Jais apart, though, is its bilingual capabilities. Unlike many monolingual models, Jais delivers high performance in both Arabic and English allowing for seamless cross-language applications. Such a feature is crucial in regions like the MENA, where both languages are widely spoken and used in various domains, from academia to business.
Delving into the technical aspect, the latest version of Jais (Jais 13B) includes innovative features such as Alibi (Attention with Linear Biases) position embeddings which is a method used to improve the performance of attention mechanisms in neural networks. This is especially useful in giving the model the ability to extrapolate to longer inputs and hence the ability to produce more sophisticated and precise answers. Furthermore, Jais also incorporates SwiGLU (Swish-Gated Linear Unit) and maximal update parametrization which are functions used in neural networks to enhance expressiveness and stability. This in turn enables the model to capture complex relationships within its wide-ranging dataset, further improving the precision and relevance of its responses [2].
How could Jais help in the development of the MENA region?
The pressing question now is how will Jais contribute to the development of the MENA region? Even though the definite answer to this question is still unclear since the model is relatively new, Jais holds the potential to revolutionize access to knowledge across Arabic speaking communities.
As an open source up-to-date Arabic LLM model, Jais will allow multiple Arabic speaking users a free access to multiple knowledge sources that were previously mainly accessible in English. Jais' contribution is likely to be most significant among Arabic speakers who may lack fluency in English or have limited access to global resources. The model’s ability to handle multiple input formats such as voice notes, text, and images also makes it accessible to a wider audience, especially those who struggle to navigate traditional search engines like Google. Another crucial advantage of Jais is its localization. The model is trained on multiple dialects and can capture the nuances of a rich complicated language like Arabic. This is especially important in providing culturally relevant responses and promoting more inclusive and equitable access to AI-driven innovations.
The Future of Jais: Beyond AI Development
The areas where Jais can play a vital role are countless, both the public and private sectors can leverage Jais for various applications. For instance, businesses can benefit from Jais in setting up their customer service strategies to increase their profitability and further improve customer satisfaction. On the other hand, academic institutions can use Jais to prepare an up-to-date school curriculum with much less time and effort.
Jais has the potential to revolutionize the health sector in the MENA region by enhancing both operational and medical processes. On the operational side, the model can streamline scheduling and billing, significantly increasing administrative efficiency. This not only saves time and resources but also allows healthcare professionals to focus more on patient care. From a medical standpoint, Jais can assist in disease prediction and facilitate personalized treatment plans. Furthermore, it can transform telehealth by providing immediate access to medical information, supporting remote diagnoses, and offering preliminary consultations, thereby improving overall healthcare delivery.
As the model continues to evolve, Jais has the potential to be a cornerstone for the broader development of AI in the MENA region. Aside from its technical advancements, the model offers a solution to the underrepresentation of Arabic in the rapidly growing world of AI. It paves the way to a future where AI systems are inclusive of all users regardless of the language barriers.
_________________________
Core42 - JAIS. (n.d.). https://www.core42.ai/jais.html
Malin, C. (2023, August 31). Will GenAI champion the Arabic language? Middle East AI News. https://www.middleeastainews.com/p/jais-large-language-model-arabic-language
Meet “Jais”, the world’s most advanced Arabic large language model open sourced by G42’s Inception - MBZUAI. (2023, August 30). MBZUAI. https://mbzuai.ac.ae/news/meet-jais-the-worlds-most-advanced-arabic-large-language-model-open-sourced-by-g42s-inception/
TenTwenty, Webdesign, Webshops & E-marketing, Dubai. (n.d.). Invent a Better Everyday | Abu Dhabi, UAE | G42 | G42 launches JAIS 70B and 20 other AI Models to Champion Arabic Natural Language Processing. Invent a Better Everyday | Abu Dhabi, UAE | G42. https://www.g42.ai/resources/news/g42-launches-jais-70b-and-20-other-ai-models-champion-arabic-natural-language-processing