EXAONE 3.5-based models are now available as open-source. We are unveiling an even more powerful model lineup in four months after the release of the EXAONE 3.0-based 7.8B model in August 2024.
Since the launch of EXAONE 3.0, we have received a lot of feedback from companies, institutions, academia, and many others. One of their key requests was to provide models of different sizes that could be put to good use for different purposes.
Image 1. Open-sourcing Three EXAONE 3.5 Models
Building on the customer feedback, we are releasing three new models this time. All three models have proven to be more powerful than comparable global counterparts. The first one is 2.4B Model, an ultra-lightweight model for on-device use. 2.4B model can learn and execute inferences on-device or on low-end GPUs, it can run even in an environment without excellent infrastructure available. Next, we present the 7.8B model, a lightweight model for versatile applications. The size of the model is the same as the previous version of the open-source model but with better performance. The last one is frontier AI-level high-performance model, the 32B model. It is a powerful model for customers who place the highest priority on performance.
Our drive to release open-source models will not stop here. We will continue to listen to all feedback on the EXAONE 3.5 models and release better models tailored to the needs of researchers. By doing so, we will contribute to the advancement of AI research and ecosystems and help lay the foundation for AI innovation. We look forward to your diverse feedback on the EXAONE 3.5 models.
Our Expertise in EXAONE 3.5
Training Efficiency
The three EXAONE 3.5 models are characterized by both exceptional performance and affordability. One of the keys to this achievement is our R&D approach, which has boosted efficiency in model training. In the pre-training phase, we sought to improve the performance of the models’ answers and reduce infrastructure costs by removing duplicates and personally identifiable information from datasets. In the post-training phase, we focused on enhancing the usability of the models and their ability to perform new tasks. Supervised Fine-tuning (SFT) and Direct Preference Optimization (DPO) methods were used to enhance their instruction following capabilities and enable the models to reflect user preferences better.
Decontamination
We also conducted a meticulous decontamination process to increase confidence in EXAONE 3.5 performance evaluation results. We borrowed a decontamination method from a global model and rigorously evaluated its performance. To do this, we compared the training data with evaluation datasets in a process that was repeated 10 times.
This is why we can confidently present the following details about EXAONE 3.5’s benchmark performance.
Key Takeaways 1. EXAONE 3.5 : A Global Model of Excellence
1. Long Context Understanding : The Top Performance in Four Benchmarks
EXAONE 3.5’s most powerful feature is its improved understanding and processing power in long contexts. As the use of Retrieval-Augmented Generation (RAG) technology generating answers based on web retrieval results or reference documents has become more prevalent, it is increasingly more important for models to understand long contexts. The EXAONE 3.5 models available at this time are designed to handle contexts of up to 32K tokens. Each model has shown superior performance in managing long contexts compared to similar-sized global models.
Some models report that they can understand a longer context than 32K tokens, but this is often merely a theoretical figure that was confirmed at the model design stage. EXAONE 3.5, on the other hand, offers an effective context length of 32K tokens (referring to the maximum token length that it can actually understand and process), making it the most effective model in line with the latest AI research and utilization trends. In particular, the bi-lingual model EXAONE has demonstrated top-tier Long Context Understanding performance in both English and Korean.
Image 2. Performance comparison results of EXAONE 3.5 - On four benchmarks representing long context scenarios
(Excluded from results if the model does not support context lengths longer than 16K)
2. Instruction Following Capabilities : The Highest Scores Across Seven Benchmarks
One of the most important aspects of the EXAONE development journey is the actual usability of models. Throughout the actual research and development process, we have focused on equipping EXAONE with performance high enough to increase human productivity and work efficiency at real-world industrial sites. In the EXAONE 3.5 Technical Report, performance in terms of real-world usability is described under "Real-World Use Cases." A total of seven benchmarks were utilized, and all three EXAONE 3.5 models ranked first in the average score for Instruction Following capabilities, significantly outperforming global models of the same size. The Instruction Following performance has also been confirmed to excel not only in English but also in Korean.
Image 3. Performance comparison results of EXAONE 3.5 - On seven benchmarks representing real-world use case scenarios
3. Business Partnerships : Uncovering New Opportunities
As for AI services, it is now time to go beyond demonstrating the potential of AI as a technology and prove its usability and build business models with AI services. For this reason, LG AI Research has been entering into partnerships with both Korean and global companies to achieve tangible business results. In Korea, we are discussing with companies who have their own proprietary software, such as Polaris Office and Hancom, about how their services can be enhanced using EXAONE 3.5-based AI services. We are currently pursuing a PoC project to incorporate EXAONE 3.5-based AI services to Hancom Office, which has high adoption rates among public institutions. When completed, this project is expected to help government and public institutions achieve work efficiency innovation.
Key Takeaways 2. EXAONE's Enhanced Features
1. General Domain : Competitive Results on Nine Benchmarks Compared to SOTA Open Models
EXAONE 3.5 also demonstrates exceptional performance in mathematical and coding capabilities. Using a total of nine general benchmarks, the 2.4B model notably achieved the highest average score, outperforming global models of the same size. Similarly, the 7.8B and 32B models ranked among the top performers in average scores.
Image 4. Performance comparison results of EXAONE 3.5 - On nine benchmarks representing general scenarios.
Bold scores indicate the best performance, and underlined scores mean the second best.
2. Responsible AI : Open and Transparent Disclosure of Information
In the development process of EXAONE 3.5, we have strived to uphold our responsibility for responsible AI. While releasing a lineup of models in various sizes as open-source can contribute to AI research and ecosystem development, it also presents potential risks, such as unintended inequality toward socially disadvantaged groups, the generation of harmful content, and malicious misuse by users. To proactively identify and prevent these risks, we conducted an AI Ethical Impact Assessment, reviewing potential risks throughout the entire AI lifecycle. Guided by the LG AI Ethics Principles, we have pursued research and development with a strong commitment to ethical standards.
The ethical evaluation of EXAONE 3.5 revealed both strengths and areas for improvement. All three models demonstrated outstanding performance in filtering out hate speech and illegal content. However, we identified a need to address biases related to regions and occupations, particularly in the 2.4B model. We chose to disclose the evaluation results transparently because we believe that advancing AI ethics requires the open sharing of information. By making these findings public, we aim to encourage more active research into AI ethics among researchers. LG AI Research will continue to lead in the field of AI ethics, building on this foundation.