spot_img
HomeResearch & DevelopmentKT's Comprehensive Framework for Responsible AI: Ensuring Safety and...

KT’s Comprehensive Framework for Responsible AI: Ensuring Safety and Reliability in AI Services

TLDR: KT has developed a robust Responsible AI (RAI) framework, detailed in their technical report, to ensure the safety and reliability of AI services. This framework includes a unique AI risk taxonomy tailored to the domestic environment, a comprehensive assessment methodology combining qualitative and quantitative evaluations, and practical tools for risk management across the AI lifecycle. Key tools like SafetyGuard, featuring Prompt Guard and Content Guard, are designed to block harmful inputs and outputs in real-time. The report also presents assessment results for KT’s Mi:dm 2.0-Base model, demonstrating strong harmlessness and robustness, while highlighting areas for future improvement in balancing helpfulness and addressing Korean-specific social biases.

In an era where artificial intelligence is rapidly advancing, ensuring its safety, reliability, and ethical operation has become paramount. KT, a leading technology innovation group, has released a comprehensive technical report detailing its approach to Responsible AI (RAI). This report outlines a unique assessment methodology and risk mitigation technologies designed to safeguard AI services from potential harms and ensure they align with societal values.

KT’s initiative stems from a thorough analysis of global AI governance trends and the domestic Basic Act on AI. The company has established a systematic framework to identify and manage risks throughout the entire AI lifecycle, from development to deployment. This proactive approach aims to move beyond theoretical principles, offering practical guidelines and tools for real-world application.

Understanding AI Risks: KT’s Taxonomy

A cornerstone of KT’s framework is its proprietary AI Risk Taxonomy, meticulously developed to suit the domestic environment. This taxonomy categorizes AI risks into three main domains:

Content-safety Risks: These address the immediate and direct harmfulness of AI outputs, including categories like violence, sexual content, self-harm, and hate speech. These are often prioritized by major industry players due to their direct impact.

Socio-economical Risks: This domain captures the broader societal impacts of AI, encompassing issues such as political and religious neutrality, anthropomorphism (attributing human qualities to AI), and sensitive uses where AI advice could significantly influence user decisions.

Legal and Rights-related Risks: This category covers potential conflicts with existing legal and ethical frameworks, including privacy violations, illegal or unethical content, copyright infringements, and even weaponization of AI.

To systematically evaluate these risks, KT has established a four-level severity criteria, classifying responses from “SAFE” to “UNSAFE,” with higher scores indicating greater potential harm.

Assessing AI Performance and Robustness

KT’s RAI assessment methodology involves both safety and robustness evaluations. The RAI Safety Assessment uses qualitative and quantitative methods to understand how AI models respond to various prompts. Qualitative assessments, conducted with proprietary Korean datasets, evaluate both “harmlessness” (how well the AI avoids harmful responses) and “helpfulness” (how well it avoids excessive refusals to legitimate requests). Quantitative assessments utilize public benchmarks like the LLM Trustworthiness Benchmark and KOBBQ (Korean Bias Benchmark for Question Answering) to measure biases and harmful tendencies efficiently.

The RAI Robustness Assessment, also known as red teaming, is an adversarial evaluation that actively probes for vulnerabilities. Malicious users attempt to bypass safety mechanisms through “jailbreak” techniques, such as prompt injection. KT has built a proprietary red teaming dataset, largely in Korean, to test models against 38 distinct jailbreak tactics. The goal is to achieve a low Attack Success Rate (ASR), indicating strong defense capabilities against such attacks.

Key Findings from Model Assessments

The report presents assessment results for several large language models, including KT’s own Mi:dm 2.0-Base, LG AI Research’s EXAONE 3.5, and Meta’s Llama-3.1-8B. Mi:dm 2.0-Base demonstrated strong performance in harmlessness, particularly in content-safety, achieving a 97.7% Not Unsafe Rate. While helpfulness showed a natural trade-off with high harmlessness, the model also performed well on the LLM Trustworthiness Benchmark, especially in identifying illegal content.

In robustness assessments, Mi:dm 2.0-Base showed the strongest defense among Korean language models with an overall ASR of 36.7%. However, all models, including Mi:dm, exhibited relatively higher ASR in socio-economical and legal/rights-related risks, highlighting the challenges in areas where cultural context and social values play a critical role. The findings underscore the need for continuous refinement to balance harmlessness with helpfulness and to better address Korean-specific social biases.

Practical Tools for AI Risk Management

To operationalize its RAI framework, KT has developed a suite of tools integrated across the AI lifecycle:

Data Cleansing Tool: This proactive tool operates during the data preparation stage, using PII (Personally Identifiable Information) and Toxic Filters to remove sensitive personal data and harmful expressions from training datasets. It’s optimized for Korean language characteristics and regulations.

Evaluation Tool: Automating the RAI assessment methodologies, this tool ensures consistent and efficient verification during the development and testing phases. It supports both safety and robustness assessments, utilizing judge LLMs for qualitative evaluations and integrating various benchmarks for quantitative analysis.

Guardrail Tool (SafetyGuard): This critical enforcement mechanism provides real-time control over user inputs and model outputs during deployment. SafetyGuard comprises two main components:

Prompt Guard: A pre-processing filter that detects and blocks malicious prompts, such as injection or jailbreak attempts, before they reach the AI model.

Content Guard: Operating at the model output stage, it includes a Content Binary Guard (for SAFE/UNSAFE classification) and a Content Multi-label Guard (for detailed severity prediction across risk categories). These are optimized for real-time streaming environments, ensuring low latency and effective filtering of harmful content.

KT’s SafetyGuard, particularly its Binary Guard, has been launched within K studio and released as open source, demonstrating a commitment to fostering a safer AI ecosystem. For more in-depth technical details, you can refer to the full research paper here.

Also Read:

Looking Ahead

While KT’s report presents comprehensive solutions, the rapid evolution of AI necessitates continuous improvement. Future plans include developing specialized approaches for domain-specific risks (e.g., law, finance), enhancing multimodal content support for images and videos, and adapting to new attack vectors. Through these ongoing efforts, KT aims to contribute significantly to the development of a responsible AI ecosystem and provide safe and reliable AI services to all users.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -