TLDR: Emission-GPT is a specialized large language model agent designed to address the challenges of fragmented and complex atmospheric emission knowledge and data. Built on a vast, curated knowledge base of over 10,000 documents, it offers accurate domain-specific question answering, interactive data analysis, and context-aware emission factor recommendations through natural language. The system integrates retrieval-augmented generation (RAG) and function calling, demonstrating superior performance compared to general-purpose LLMs in extracting insights, analyzing trends, and automating workflows for emission inventory development and environmental assessment.
Understanding and managing air pollutant and greenhouse gas emissions are crucial for improving air quality and combating climate change. However, the information related to emissions is often scattered and highly technical, making it difficult for non-experts to access and interpret. Traditional methods for compiling emission data are also inefficient, posing significant challenges for both research and environmental management.
To tackle these issues, researchers have developed Emission-GPT, a sophisticated language model agent specifically designed for the atmospheric emissions domain. This AI tool is built upon a comprehensive knowledge base containing over 10,000 documents, including official standards, detailed reports, practical guidebooks, and peer-reviewed scientific literature. Emission-GPT uses advanced techniques like prompt engineering and question completion to provide precise answers to domain-specific questions.
One of Emission-GPT’s standout features is its ability to allow users to interact with and analyze emission data using natural language. This means users can simply ask questions to query and visualize emission inventories, understand the contributions of different sources, and even get recommendations for emission factors tailored to specific scenarios. A practical case study conducted in Guangdong Province demonstrated that Emission-GPT can effortlessly extract crucial insights, such as the distribution of point sources and trends across different sectors, directly from raw data using straightforward prompts.
The system’s architecture is modular and designed for extensibility, which helps automate tasks that traditionally required extensive manual effort. This positions Emission-GPT as a foundational tool for developing next-generation emission inventories and conducting scenario-based environmental assessments.
How Emission-GPT Works
Emission-GPT operates through a multi-stage pipeline. When a user submits a query, the system first classifies it into one of two categories: emission-related knowledge or emission-related data analysis. For knowledge-based questions, a specialized language model uses Retrieval-Augmented Generation (RAG) to pull relevant information from the extensive knowledge base and formulate a comprehensive answer. For data analysis queries, another language model constructs API-level requests and SQL-like queries to interact with backend emission inventory and emission factor databases. This process is robust, with built-in optimization for failed data retrievals, and can even visualize results for the user.
The knowledge base itself is a meticulously curated collection of 10,332 authoritative documents, gathered and organized by 24 doctoral and master’s students over a month. It includes journal articles, policy documents, and scholarly books in both Chinese and English, ensuring high quality and relevance. The data covers major sectors like industrial, agricultural, and biomass burning, and key pollutants such as CO2, NOx, and PM2.5, across various geographic scales.
The RAG framework within Emission-GPT transforms user queries into vectors to retrieve semantically relevant information from this knowledge base. It uses models like Qwen-plus for context segmentation and BGE-M3 for generating dense vector representations. This approach ensures factual accuracy and contextual relevance, even supporting multi-turn conversations by embedding previous interactions into new queries.
Emission Factor Recommendations and Data Analysis
Emission factors (EFs) are critical for accurate emission estimates, but their selection can be time-consuming and require deep expertise. Emission-GPT simplifies this with a generative AI-powered recommendation tool. It uses a two-stage retrieval and evaluation framework: first, matching user-specified source attributes with official guidelines, and then performing a semantic search across peer-reviewed literature, ranking candidates based on criteria like data representativeness and methodological reliability.
The toolchain also allows for interactive data analysis. Users can ask natural language questions about pollutant types, spatial and temporal dimensions, and source categories. The system then autonomously identifies appropriate functions, retrieves relevant inventory data, and generates easy-to-understand visual outputs like stacked bar charts and pie charts. This capability significantly lowers the technical barrier to data access and analysis, making complex environmental diagnostics accessible without manual coding.
Also Read:
- Enhancing Access to UK Clinical Guidelines with AI: A RAG System for Healthcare
- AutoContext: Learning Environment Facts for Smarter AI Agents
Performance and Future Outlook
Evaluations showed that Emission-GPT performs exceptionally well in generating accurate and relevant responses, especially when provided with appropriate context. Human expert evaluations further confirmed its superiority over general-purpose models like GPT-4o and DeepSeek R1 in terms of accuracy, citation quality, and relevance, particularly for more complex tasks. You can learn more about this innovative system by reading the full research paper available at arXiv:2510.02359.
While Emission-GPT represents a significant leap forward, the researchers acknowledge areas for future enhancement. These include expanding its capabilities beyond textual documents to structured datasets, numerical time series, and geospatial imagery, integrating a knowledge graph for more complex reasoning, automating the knowledge base updating process, and enabling the processing of visual content within documents. As emission science continues to evolve, Emission-GPT is poised to become an even more robust platform for environmental research, policy-making, and real-world decision support.


