TLDR: LAFA is a novel system that integrates LLM-agent-based data analytics with federated analytics (FA). It allows users to pose complex natural language queries over decentralized, privacy-sensitive data. LAFA uses a hierarchical multi-agent architecture to decompose queries, map them to FA operations, and optimize these operations to reduce redundancy and improve efficiency, ensuring privacy-preserving computation and delivering accurate results.
In the rapidly evolving landscape of data analytics, Large Language Models (LLMs) have emerged as powerful tools, capable of interpreting complex natural language queries and automating data analysis tasks. However, a significant challenge remains: these LLM-agent-based systems typically operate with centralized data access, which raises considerable privacy concerns, especially with stringent regulations like GDPR and CCPA.
The Dual Challenge: Privacy and Accessibility
On the other side, Federated Analytics (FA) offers a robust solution for privacy-preserving computation across distributed data sources. In FA, raw data never leaves the client device; instead, only privacy-preserving intermediate results are shared with a central server. While FA excels in privacy, it traditionally lacks support for natural language input, requiring structured, machine-readable queries that demand specialized expertise.
Introducing LAFA: Bridging the Gap
To address this critical divide, researchers have introduced LAFA (Agentic LLM-Driven Federated Analytics), a pioneering system that seamlessly integrates LLM-agent-based data analytics with Federated Analytics. LAFA is designed to accept natural language queries and transform them into optimized, executable FA workflows, all while maintaining strong privacy protections over decentralized data.
How LAFA Works: A Hierarchical Multi-Agent System
LAFA employs a sophisticated hierarchical multi-agent architecture to manage the complexity of natural language queries and FA operations. This system comprises several key agents:
- Coarse-grained Planner Agent: This agent is responsible for the initial breakdown of complex natural language queries into smaller, manageable sub-queries. For instance, a query asking for the average salary in a university and the difference between professors and Ph.D. students would be split into multiple distinct analytical intents.
- Fine-grained Planner Agent: Once sub-queries are identified, this agent maps each one into a preliminary Directed Acyclic Graph (DAG) of FA operations. It leverages prior structural knowledge of valid FA pipelines, ensuring that each step adheres to correct privacy-preserving semantics, such as preprocessing, encryption, aggregation, noise addition, decryption, and postprocessing.
- DAG Optimizer Agent: A crucial component, the optimizer agent takes all preliminary DAGs and merges them into a single, optimized DAG. Its primary role is to eliminate redundant operations, such as repeated data access, encryption, or aggregation across overlapping sub-queries. This significantly reduces computational and communication overhead, which is particularly vital in large-scale federated environments. It achieves this by identifying common operations and partitioning clients into groups based on similar features, performing calculations more efficiently.
- Answerer Agent: After the optimized FA pipeline is executed, the answerer agent composes the final results into a coherent, natural language response for the querier, ensuring a user-friendly experience.
Also Read:
- Unifying Natural Language Queries Across Databases and APIs with a Declarative Approach
- Boosting AI Teamwork: How Verification-Aware Planning Enhances Multi-Agent Systems
Enhanced Efficiency and Accuracy
Experiments demonstrate that LAFA consistently outperforms traditional prompting strategies. It achieves significantly higher execution plan success rates, ensuring that queries are correctly understood and translated into valid FA operations. Furthermore, LAFA substantially reduces resource-intensive FA operations, leading to more efficient data processing. The DAG optimizer, in particular, plays a vital role in this efficiency, minimizing repeated steps and offloading complexity to lightweight post-processing calculations.
LAFA represents a significant step forward in making privacy-preserving data analytics more accessible and efficient, allowing users to interact with decentralized data using natural language without compromising privacy. For a deeper dive into the technical details, you can read the full research paper here.


