TLDR: This paper introduces Domain Adaptive Continual Pretraining (DACP) as a method to enhance small language models (sLLMs) for specific industrial applications like telecommunications and finance. DACP involves continually pretraining sLLMs on domain-specific data while using replay datasets to prevent forgetting general knowledge. Experiments show DACP-applied sLLMs achieve significant performance gains in target domains, often outperforming larger general models, while remaining cost-efficient and preserving general capabilities. Real-world evaluations confirm its practical utility in tasks like customer service summarization and RAG-based QA systems.
The rise of open-source large language models (LLMs) has opened new doors for enterprise applications. However, many organizations face challenges with the extensive infrastructure required to deploy and maintain these massive models. This has led to small language models (sLLMs) emerging as a practical alternative, despite their inherent performance limitations compared to their larger counterparts.
A promising approach to overcome these limitations is Domain Adaptive Continual Pretraining (DACP). While DACP has been explored for domain adaptation, its real-world utility in commercial settings has been less understood. A recent study validates the effectiveness of applying a DACP-based method across various foundation models and service domains, demonstrating its potential for industrial applications.
The core idea behind DACP is to continually pretrain a general-purpose LLM using a corpus of domain-specific, unlabeled data. This method offers a compelling alternative to training models from scratch, which is often prohibitively expensive and time-consuming. Through extensive experiments and real-world evaluations, the study shows that sLLMs enhanced with DACP achieve significant performance gains in their target domains while successfully preserving their general capabilities. This makes DACP a cost-efficient and scalable solution for enterprise-level deployment.
One of the key challenges in continual learning, including DACP, is catastrophic forgetting—where the model loses previously acquired general knowledge when learning new domain-specific information. To mitigate this, the researchers incorporated replay datasets, which consist of widely adopted public corpora like FineWeb, Common Crawl, Wikipedia, and GitHub Code. For enhanced Korean language performance, a substantial portion of Korean corpora from sources like AIHub and NIKL was also included. A preliminary study revealed that a 50% replay ratio effectively balances the retention of general capabilities with improvements in domain performance, a ratio adopted for the full DACP corpus.
The DACP process focuses on acquiring domain knowledge and is applied before instruction tuning. Therefore, DACP-applied models require a subsequent post-training step, such as instruction and alignment tuning, to restore their ability to follow instructions and effectively utilize the newly acquired domain knowledge. Public instruction datasets like Tulu 3 and AIHub, along with synthetic data, were used for this crucial step.
The study conducted comprehensive benchmark evaluations across multiple domains, including Telco and Finance, and various foundation models like LLaMA, Qwen, and EXAONE. The results consistently showed that DACP significantly improved domain-specific performance across all models. For instance, Telco DACP models demonstrated improvements ranging from 51% to 69% on Telco benchmarks, while general-domain performance remained largely stable, confirming successful domain adaptation without substantial knowledge loss. Similarly, finance-adapted models outperformed larger general sLLMs in their targeted domain.
Beyond benchmarks, the practical utility of DACP was evaluated in real-world Telco applications. This included deploying a Telco-domain LLM to support customer service agents for tasks like call summarization and a network equipment QA system. Human evaluations showed that the DACP-applied models significantly outperformed baseline models in summarization tasks. For the network QA system, the DACP-enhanced model drastically reduced failure rates, proving its effectiveness in improving domain-specific understanding and retrieval-augmented generation (RAG) performance. Notably, smaller DACP-applied models even managed to outperform larger general models, highlighting their practicality for deployments with infrastructure or service-level constraints.
In the financial domain, DACP with RAG was tested on tasks involving specialized terminology and complex concepts. The DACP-applied model achieved a 73.61% Mean Reciprocal Rank (MRR), outperforming both vanilla and post-trained baseline models by a significant margin, underscoring the effectiveness of domain adaptation in specialized financial applications.
Also Read:
- Context Tuning: Enhancing LLM Adaptation with Task-Specific Examples
- Large Language Models in Geotechnical Engineering: Adaptation Strategies and Practical Applications
In conclusion, this research presents a robust recipe for implementing DACP using mid-scale domain corpora, proving its applicability and efficiency for industrial use cases. The methodology enables companies to deploy high-performing, domain-adapted sLLMs at a lower cost, even with limited inference computing infrastructure, eliminating the need for larger models to meet service quality requirements. This approach is robust across variations in domain, parameter size, and foundation model type, promising an enhanced user experience in real-world services. For more details, you can refer to the full paper here.


