TLDR: The P–C–G (Planner–Caller–Generator) architecture is a small-scale language model (SLM) based agent system optimized for Korean tool use. It separates tasks into specialized Planner, Caller, and Generator modules, employing an initial planning strategy and a Korean-first value policy to enhance efficiency and reliability. The system demonstrates competitive accuracy and end-to-end quality in Korean tool-use scenarios, including single-chain, multi-chain, and constraint awareness, while significantly reducing token usage and maintaining acceptable latency compared to larger models.
Large Language Models (LLMs) have made incredible strides in understanding and generating human language, but their immense size comes with significant drawbacks: high computational costs, substantial resource requirements, and a tendency to sometimes generate incorrect or fabricated information, known as hallucinations. To tackle these issues, researchers are increasingly turning to Small-scale Language Models (SLMs) and Agentic AI systems. Agentic AI combines language models with external tools, like search engines or APIs, allowing them to access up-to-date information and perform complex, multi-step tasks more reliably.
However, most existing Agentic AI systems still heavily rely on large LLMs, inheriting their cost and latency problems. Furthermore, there’s a notable gap in systems specifically optimized for non-English languages, particularly Korean, where unique challenges arise from frequent language switching and a lack of standardized tool specifications.
Introducing P–C–G: An Optimized Agent Architecture for Korean Tool Use
A new research paper introduces an innovative SLM-based agent architecture called Planner–Caller–Generator (P–C–G), specifically designed to overcome these limitations and excel in Korean tool-use environments. The core idea behind P–C–G is to break down complex tasks into three specialized roles, each handled by a dedicated SLM module:
- Planner: This module analyzes a user’s request and creates an initial, comprehensive plan detailing which tools to use and in what order. It focuses on efficient planning upfront to minimize unnecessary steps.
- Caller: Following the Planner’s instructions, the Caller prepares and executes the tool calls. A key feature here is the “Korean-first value policy,” which ensures that values passed to tools remain in Korean by default, preventing errors caused by unintended language conversions. It also validates parameters against tool schemas to ensure correctness.
- Generator: Once the tools have been called and results gathered, the Generator module takes these outputs, integrates them, and formats them into a clear, natural language response that directly addresses the user’s original intent.
This role-separated design, combined with an “initial planning + limited on-demand replanning” strategy, aims to achieve LLM-comparable performance while significantly reducing token usage and maintaining acceptable response times. Unlike traditional systems that might repeatedly call a planner for each step, P–C–G plans once and only replans if absolutely necessary, such as when a tool execution fails or produces unexpected results.
Addressing Korean-Specific Challenges
The paper highlights the critical need for robust tool use in Korean contexts. Unintended Korean-to-English switching in tool arguments can lead to execution failures or distorted queries for Korean databases. The P–C–G architecture directly addresses this with its Korean-first value policy and rigorous schema/value co-validation within the Caller module. This ensures language consistency and reduces errors, which is crucial for services dealing with Korean user inputs, names, and locations.
Rigorous Evaluation and Promising Results
To validate P–C–G, the researchers created a comprehensive Korean tool-use dataset covering various scenarios: single-tool calls (Single-chain), multi-step chained calls (Multi-chain), situations where required information is missing (Missing Parameters), and cases where a necessary tool is unavailable (Missing Functions). The evaluation used an LLM-as-a-Judge protocol, averaging results over five runs to ensure fairness.
The results are compelling:
- Efficient Planning: P–C–G achieved the highest “As-planned” rate (92.3%) and a very low “Over-planning” rate (1.6%), demonstrating its ability to create accurate and efficient initial plans.
- High Tool-Use Accuracy: The Caller module showed superior tool-use accuracy at 75.0%, outperforming other models, including larger ones.
- Strong Performance Across Tasks: P–C–G performed exceptionally well in Single-chain tasks (95.6% Call Accuracy, 91.2% Overall) and showed competitive results in Multi-chain tasks (33.8% Call Accuracy, 62.4% Overall), even slightly surpassing GPT-4o-mini in call accuracy for multi-chain scenarios.
- Constraint Awareness: While it had some room for improvement in handling Missing Parameters, P–C–G excelled in identifying and gracefully handling Missing Functions, achieving the highest accuracy at 91.2%.
- High Task Success Rate: The architecture achieved a Task Success Rate (TSR) of 79.7%, on par with much larger models like GPT-4o-mini, indicating its practical effectiveness.
- Inference Efficiency: P–C–G used the fewest tokens among SLMs (4,360.3 tokens on average for correct answers), representing a 12–22% reduction compared to other models, and maintained a competitive response time of 9.1 seconds.
These findings suggest that P–C–G offers a balanced approach, delivering both high accuracy and cost-efficiency. It demonstrates that SLM-based Agentic AI, when designed with specialized roles and optimized for specific linguistic contexts, can be a viable and cost-effective alternative to large, resource-intensive LLMs for real-world applications, especially in Korean environments. For more in-depth information, you can read the full research paper here.
Also Read:
- CoPiC: Enhancing AI Agent Planning with Adaptive Programs and Critics
- Optimizing Multi-Agent System Initialization for Enhanced Collaboration
Future Directions
The researchers acknowledge areas for future improvement, such as further reducing Planner latency, enhancing parameter completion capabilities, and developing more robust partial-failure recovery mechanisms. They also plan to address security and privacy concerns, including prompt injection and data exfiltration.


