KT's Comprehensive Framework for Responsible AI: Ensuring Safety and Reliability in AI Services

TLDR: KT has developed a robust Responsible AI (RAI) framework, detailed in their technical report, to ensure the safety and reliability of AI services. This framework includes a unique AI risk taxonomy tailored to the domestic environment, a comprehensive assessment methodology combining qualitative and quantitative evaluations, and practical tools for risk management across the AI lifecycle. Key tools like SafetyGuard, featuring Prompt Guard and Content Guard, are designed to block harmful inputs and outputs in real-time. The report also presents assessment results for KT’s Mi:dm 2.0-Base model, demonstrating strong harmlessness and robustness, while highlighting areas for future improvement in balancing helpfulness and addressing Korean-specific social biases.

In an era where artificial intelligence is rapidly advancing, ensuring its safety, reliability, and ethical operation has become paramount. KT, a leading technology innovation group, has released a comprehensive technical report detailing its approach to Responsible AI (RAI). This report outlines a unique assessment methodology and risk mitigation technologies designed to safeguard AI services from potential harms and ensure they align with societal values.

KT’s initiative stems from a thorough analysis of global AI governance trends and the domestic Basic Act on AI. The company has established a systematic framework to identify and manage risks throughout the entire AI lifecycle, from development to deployment. This proactive approach aims to move beyond theoretical principles, offering practical guidelines and tools for real-world application.

Understanding AI Risks: KT’s Taxonomy

A cornerstone of KT’s framework is its proprietary AI Risk Taxonomy, meticulously developed to suit the domestic environment. This taxonomy categorizes AI risks into three main domains:

Content-safety Risks: These address the immediate and direct harmfulness of AI outputs, including categories like violence, sexual content, self-harm, and hate speech. These are often prioritized by major industry players due to their direct impact.

Socio-economical Risks: This domain captures the broader societal impacts of AI, encompassing issues such as political and religious neutrality, anthropomorphism (attributing human qualities to AI), and sensitive uses where AI advice could significantly influence user decisions.

Legal and Rights-related Risks: This category covers potential conflicts with existing legal and ethical frameworks, including privacy violations, illegal or unethical content, copyright infringements, and even weaponization of AI.

To systematically evaluate these risks, KT has established a four-level severity criteria, classifying responses from “SAFE” to “UNSAFE,” with higher scores indicating greater potential harm.

Assessing AI Performance and Robustness

KT’s RAI assessment methodology involves both safety and robustness evaluations. The RAI Safety Assessment uses qualitative and quantitative methods to understand how AI models respond to various prompts. Qualitative assessments, conducted with proprietary Korean datasets, evaluate both “harmlessness” (how well the AI avoids harmful responses) and “helpfulness” (how well it avoids excessive refusals to legitimate requests). Quantitative assessments utilize public benchmarks like the LLM Trustworthiness Benchmark and KOBBQ (Korean Bias Benchmark for Question Answering) to measure biases and harmful tendencies efficiently.

The RAI Robustness Assessment, also known as red teaming, is an adversarial evaluation that actively probes for vulnerabilities. Malicious users attempt to bypass safety mechanisms through “jailbreak” techniques, such as prompt injection. KT has built a proprietary red teaming dataset, largely in Korean, to test models against 38 distinct jailbreak tactics. The goal is to achieve a low Attack Success Rate (ASR), indicating strong defense capabilities against such attacks.

Key Findings from Model Assessments

The report presents assessment results for several large language models, including KT’s own Mi:dm 2.0-Base, LG AI Research’s EXAONE 3.5, and Meta’s Llama-3.1-8B. Mi:dm 2.0-Base demonstrated strong performance in harmlessness, particularly in content-safety, achieving a 97.7% Not Unsafe Rate. While helpfulness showed a natural trade-off with high harmlessness, the model also performed well on the LLM Trustworthiness Benchmark, especially in identifying illegal content.

In robustness assessments, Mi:dm 2.0-Base showed the strongest defense among Korean language models with an overall ASR of 36.7%. However, all models, including Mi:dm, exhibited relatively higher ASR in socio-economical and legal/rights-related risks, highlighting the challenges in areas where cultural context and social values play a critical role. The findings underscore the need for continuous refinement to balance harmlessness with helpfulness and to better address Korean-specific social biases.

Practical Tools for AI Risk Management

To operationalize its RAI framework, KT has developed a suite of tools integrated across the AI lifecycle:

Data Cleansing Tool: This proactive tool operates during the data preparation stage, using PII (Personally Identifiable Information) and Toxic Filters to remove sensitive personal data and harmful expressions from training datasets. It’s optimized for Korean language characteristics and regulations.

Evaluation Tool: Automating the RAI assessment methodologies, this tool ensures consistent and efficient verification during the development and testing phases. It supports both safety and robustness assessments, utilizing judge LLMs for qualitative evaluations and integrating various benchmarks for quantitative analysis.

Guardrail Tool (SafetyGuard): This critical enforcement mechanism provides real-time control over user inputs and model outputs during deployment. SafetyGuard comprises two main components:

Prompt Guard: A pre-processing filter that detects and blocks malicious prompts, such as injection or jailbreak attempts, before they reach the AI model.

Content Guard: Operating at the model output stage, it includes a Content Binary Guard (for SAFE/UNSAFE classification) and a Content Multi-label Guard (for detailed severity prediction across risk categories). These are optimized for real-time streaming environments, ensuring low latency and effective filtering of harmful content.

KT’s SafetyGuard, particularly its Binary Guard, has been launched within K studio and released as open source, demonstrating a commitment to fostering a safer AI ecosystem. For more in-depth technical details, you can refer to the full research paper here.

Also Read:

Looking Ahead

While KT’s report presents comprehensive solutions, the rapid evolution of AI necessitates continuous improvement. Future plans include developing specialized approaches for domain-specific risks (e.g., law, finance), enhancing multimodal content support for images and videos, and adapting to new attack vectors. Through these ongoing efforts, KT aims to contribute significantly to the development of a responsible AI ecosystem and provide safe and reliable AI services to all users.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

KT’s Comprehensive Framework for Responsible AI: Ensuring Safety and Reliability in AI Services

Understanding AI Risks: KT’s Taxonomy

Assessing AI Performance and Robustness

Key Findings from Model Assessments

Practical Tools for AI Risk Management

Looking Ahead

Gen AI News and Updates

South Korea’s Kang Ha-yeon Appointed First Chair of OECD’s AIGO and GPAI

Ghana Navigates Complexities in AI Regulatory Development Amidst Coordination Challenges

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates