UCR Researchers Fortify AI Safety in Compact Models

TLDR: Researchers at the University of California, Riverside (UCR) have developed a novel method to ensure AI models retain their crucial safety features, even when scaled down for deployment on smaller devices. This innovation addresses the vulnerability of open-source AI models to malicious manipulation when internal processing layers are removed to conserve power, by retraining the model’s core structure to inherently understand and block dangerous content.

RIVERSIDE, CA – As artificial intelligence models increasingly transition from powerful cloud servers to more compact devices like smartphones and automobiles, a critical challenge emerges: maintaining their inherent safety protocols. Researchers at the University of California, Riverside (UCR) have unveiled a groundbreaking method to fortify these AI systems against malicious ‘rewiring’ or manipulation, particularly when models are downsized, ensuring that essential safety features remain robust.

Generative AI models, while powerful, often undergo significant trimming to conserve memory and computational power when adapted for lower-power deployments. This process can inadvertently strip away the very safeguards designed to prevent the generation of harmful content, such as hate speech or instructions for illicit activities. Open-source AI models, which are freely downloadable and modifiable, are especially susceptible to such vulnerabilities due to the absence of continuous cloud monitoring and infrastructure inherent in proprietary systems.

To counter this growing threat, the UCR team has devised an innovative approach that focuses on retraining the internal structure of the AI model itself. Unlike solutions that rely on external filters or software patches, this method fundamentally alters how the model comprehends and identifies risky content. Saketh Bachu, a UCR graduate student and co-lead author of the study, emphasized the core objective: “Our goal was to make sure the model doesn’t forget how to behave safely when it’s been slimmed down.”

The researchers successfully tested their methodology using LLaVA 1.5, a vision language model capable of processing both text and images. The results demonstrated that the model’s ability to detect and block dangerous prompts was preserved, even after key internal processing layers were removed. This internal restructuring ensures that safety is ingrained at a foundational level, rather than being an easily bypassable external layer.

The research team, which includes doctoral students Arindam Dutta, Rohit Lal, and Trishna Chakraborty, along with UCR faculty members Chengyu Song, Yue Dong, and Nael Abu-Ghazaleh, and co-lead author Saketh Bachu, presented their findings in a paper at this year’s International Conference on Machine Learning in Vancouver, Canada. Professor Amit Roy-Chowdhury, also involved in the research, acknowledged the ongoing nature of the work, stating, “There’s still more work to do, but this is a concrete step toward developing AI in a way that’s both open and responsible.”

Also Read:

The ultimate goal of this UCR initiative is to develop techniques that guarantee safety across every internal layer of AI models, thereby enhancing their resilience and reliability in diverse real-world applications.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

UCR Researchers Fortify AI Safety in Compact Models

Gen AI News and Updates

AI’s Hyper-Growth Unlocked: OpenAI’s $500B Valuation Forces a Capital Re-evaluation for Investors

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

Ghana Navigates Complexities in AI Regulatory Development Amidst Coordination Challenges

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

OneShield Achieves Landmark Registration Under Cloud Security Alliance AI Controls Matrix, Setting New Industry Standard

SeedAI Leads Utah’s Proactive Initiative for Ethical AI Integration in Business

Bahrain Commended for AI Preparedness in New UNESCO Global Report

U.S. Air Force Secures Skydio Drone Technology for Enhanced Autonomous Operations

Malaysia Forges Ahead with AI Development, Prioritizing Governance and Ethical Frameworks

Contractify Honored as Top Contract Management Solution Provider for 2025 by LegalTech Breakthrough Awards

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

Astreya Unveils New Wave of Enterprise AI Agents to Boost Business Efficiency and Automation

EPAM Honored with Microsoft’s 2025 Innovate with Azure AI Platform Partner of the Year Award for Pioneering AI Solutions

EBU Academy’s School of AI Honored with European Digital Skills Award for Upskilling Media Professionals

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Netherlands Unveils Ambitious AI Strategy to Shape Global Governance Frameworks

BRYGE AI Secures Silver Stevie® Award for Groundbreaking Health Tech Product for Women

Prepify AI and ZoraSafe, Inc. Honored with ‘Panelists’ Choice’ Awards at UF Innovate’s GatorPitch in Miami

Subscribe to get the latest news and updates