P–C–G: A Small Language Model Agent for Korean Tool Use

TLDR: The P–C–G (Planner–Caller–Generator) architecture is a small-scale language model (SLM) based agent system optimized for Korean tool use. It separates tasks into specialized Planner, Caller, and Generator modules, employing an initial planning strategy and a Korean-first value policy to enhance efficiency and reliability. The system demonstrates competitive accuracy and end-to-end quality in Korean tool-use scenarios, including single-chain, multi-chain, and constraint awareness, while significantly reducing token usage and maintaining acceptable latency compared to larger models.

Large Language Models (LLMs) have made incredible strides in understanding and generating human language, but their immense size comes with significant drawbacks: high computational costs, substantial resource requirements, and a tendency to sometimes generate incorrect or fabricated information, known as hallucinations. To tackle these issues, researchers are increasingly turning to Small-scale Language Models (SLMs) and Agentic AI systems. Agentic AI combines language models with external tools, like search engines or APIs, allowing them to access up-to-date information and perform complex, multi-step tasks more reliably.

However, most existing Agentic AI systems still heavily rely on large LLMs, inheriting their cost and latency problems. Furthermore, there’s a notable gap in systems specifically optimized for non-English languages, particularly Korean, where unique challenges arise from frequent language switching and a lack of standardized tool specifications.

Introducing P–C–G: An Optimized Agent Architecture for Korean Tool Use

A new research paper introduces an innovative SLM-based agent architecture called Planner–Caller–Generator (P–C–G), specifically designed to overcome these limitations and excel in Korean tool-use environments. The core idea behind P–C–G is to break down complex tasks into three specialized roles, each handled by a dedicated SLM module:

Planner: This module analyzes a user’s request and creates an initial, comprehensive plan detailing which tools to use and in what order. It focuses on efficient planning upfront to minimize unnecessary steps.
Caller: Following the Planner’s instructions, the Caller prepares and executes the tool calls. A key feature here is the “Korean-first value policy,” which ensures that values passed to tools remain in Korean by default, preventing errors caused by unintended language conversions. It also validates parameters against tool schemas to ensure correctness.
Generator: Once the tools have been called and results gathered, the Generator module takes these outputs, integrates them, and formats them into a clear, natural language response that directly addresses the user’s original intent.

This role-separated design, combined with an “initial planning + limited on-demand replanning” strategy, aims to achieve LLM-comparable performance while significantly reducing token usage and maintaining acceptable response times. Unlike traditional systems that might repeatedly call a planner for each step, P–C–G plans once and only replans if absolutely necessary, such as when a tool execution fails or produces unexpected results.

Addressing Korean-Specific Challenges

The paper highlights the critical need for robust tool use in Korean contexts. Unintended Korean-to-English switching in tool arguments can lead to execution failures or distorted queries for Korean databases. The P–C–G architecture directly addresses this with its Korean-first value policy and rigorous schema/value co-validation within the Caller module. This ensures language consistency and reduces errors, which is crucial for services dealing with Korean user inputs, names, and locations.

Rigorous Evaluation and Promising Results

To validate P–C–G, the researchers created a comprehensive Korean tool-use dataset covering various scenarios: single-tool calls (Single-chain), multi-step chained calls (Multi-chain), situations where required information is missing (Missing Parameters), and cases where a necessary tool is unavailable (Missing Functions). The evaluation used an LLM-as-a-Judge protocol, averaging results over five runs to ensure fairness.

The results are compelling:

Efficient Planning: P–C–G achieved the highest “As-planned” rate (92.3%) and a very low “Over-planning” rate (1.6%), demonstrating its ability to create accurate and efficient initial plans.
High Tool-Use Accuracy: The Caller module showed superior tool-use accuracy at 75.0%, outperforming other models, including larger ones.
Strong Performance Across Tasks: P–C–G performed exceptionally well in Single-chain tasks (95.6% Call Accuracy, 91.2% Overall) and showed competitive results in Multi-chain tasks (33.8% Call Accuracy, 62.4% Overall), even slightly surpassing GPT-4o-mini in call accuracy for multi-chain scenarios.
Constraint Awareness: While it had some room for improvement in handling Missing Parameters, P–C–G excelled in identifying and gracefully handling Missing Functions, achieving the highest accuracy at 91.2%.
High Task Success Rate: The architecture achieved a Task Success Rate (TSR) of 79.7%, on par with much larger models like GPT-4o-mini, indicating its practical effectiveness.
Inference Efficiency: P–C–G used the fewest tokens among SLMs (4,360.3 tokens on average for correct answers), representing a 12–22% reduction compared to other models, and maintained a competitive response time of 9.1 seconds.

These findings suggest that P–C–G offers a balanced approach, delivering both high accuracy and cost-efficiency. It demonstrates that SLM-based Agentic AI, when designed with specialized roles and optimized for specific linguistic contexts, can be a viable and cost-effective alternative to large, resource-intensive LLMs for real-world applications, especially in Korean environments. For more in-depth information, you can read the full research paper here.

Also Read:

Future Directions

The researchers acknowledge areas for future improvement, such as further reducing Planner latency, enhancing parameter completion capabilities, and developing more robust partial-failure recovery mechanisms. They also plan to address security and privacy concerns, including prompt injection and data exfiltration.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

P–C–G: A Small Language Model Agent for Korean Tool Use

Introducing P–C–G: An Optimized Agent Architecture for Korean Tool Use

Addressing Korean-Specific Challenges

Rigorous Evaluation and Promising Results

Future Directions

Gen AI News and Updates

SOCi Achieves Major Milestone with 150,000 AI Agents Automating 10 Million Local Marketing Tasks

TD Synnex Unveils Agentic AI-Powered Digital Bridge to Revolutionize Partner Sales and Productivity

Avalara Secures $500 Million Investment from BlackRock to Propel AI-Powered Tax Automation

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates