spot_img
HomeNews & Current EventsUCR Researchers Fortify AI Safety in Compact Models

UCR Researchers Fortify AI Safety in Compact Models

TLDR: Researchers at the University of California, Riverside (UCR) have developed a novel method to ensure AI models retain their crucial safety features, even when scaled down for deployment on smaller devices. This innovation addresses the vulnerability of open-source AI models to malicious manipulation when internal processing layers are removed to conserve power, by retraining the model’s core structure to inherently understand and block dangerous content.

RIVERSIDE, CA – As artificial intelligence models increasingly transition from powerful cloud servers to more compact devices like smartphones and automobiles, a critical challenge emerges: maintaining their inherent safety protocols. Researchers at the University of California, Riverside (UCR) have unveiled a groundbreaking method to fortify these AI systems against malicious ‘rewiring’ or manipulation, particularly when models are downsized, ensuring that essential safety features remain robust.

Generative AI models, while powerful, often undergo significant trimming to conserve memory and computational power when adapted for lower-power deployments. This process can inadvertently strip away the very safeguards designed to prevent the generation of harmful content, such as hate speech or instructions for illicit activities. Open-source AI models, which are freely downloadable and modifiable, are especially susceptible to such vulnerabilities due to the absence of continuous cloud monitoring and infrastructure inherent in proprietary systems.

To counter this growing threat, the UCR team has devised an innovative approach that focuses on retraining the internal structure of the AI model itself. Unlike solutions that rely on external filters or software patches, this method fundamentally alters how the model comprehends and identifies risky content. Saketh Bachu, a UCR graduate student and co-lead author of the study, emphasized the core objective: “Our goal was to make sure the model doesn’t forget how to behave safely when it’s been slimmed down.”

The researchers successfully tested their methodology using LLaVA 1.5, a vision language model capable of processing both text and images. The results demonstrated that the model’s ability to detect and block dangerous prompts was preserved, even after key internal processing layers were removed. This internal restructuring ensures that safety is ingrained at a foundational level, rather than being an easily bypassable external layer.

The research team, which includes doctoral students Arindam Dutta, Rohit Lal, and Trishna Chakraborty, along with UCR faculty members Chengyu Song, Yue Dong, and Nael Abu-Ghazaleh, and co-lead author Saketh Bachu, presented their findings in a paper at this year’s International Conference on Machine Learning in Vancouver, Canada. Professor Amit Roy-Chowdhury, also involved in the research, acknowledged the ongoing nature of the work, stating, “There’s still more work to do, but this is a concrete step toward developing AI in a way that’s both open and responsible.”

Also Read:

The ultimate goal of this UCR initiative is to develop techniques that guarantee safety across every internal layer of AI models, thereby enhancing their resilience and reliability in diverse real-world applications.

Ananya Rao
Ananya Raohttps://blogs.edgentiq.com
Ananya Rao is a tech journalist with a passion for dissecting the fast-moving world of Generative AI. With a background in computer science and a sharp editorial eye, she connects the dots between policy, innovation, and business. Ananya excels in real-time reporting and specializes in uncovering how startups and enterprises in India are navigating the GenAI boom. She brings urgency and clarity to every breaking news piece she writes. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -