Optimizing AI for Video Surveillance: Balancing Performance, Privacy, and Energy Efficiency

TLDR: This research compares two federated learning approaches for violence detection: LoRA-tuned Vision-Language Models (VLMs) and personalized 3D Convolutional Neural Networks (CNN3D). Both achieve over 90% accuracy, but CNN3D uses significantly less energy while slightly outperforming VLMs in some metrics. VLMs are better for complex contextual reasoning. The study proposes a hybrid model using efficient CNNs for routine tasks and selective VLM activation for complex scenarios, emphasizing sustainable and privacy-aware AI in video surveillance.

Video surveillance systems are increasingly relying on advanced artificial intelligence (AI) to detect and analyze violent incidents in public spaces. However, traditional centralized approaches, where all video data is sent to a central server, raise significant privacy concerns. Additionally, the computational and environmental costs of deploying large AI models at scale are becoming a major point of scrutiny for researchers and regulators alike.

A recent research paper, “Frugal Federated Learning for Violence Detection: A Comparison of LoRA-Tuned VLMs and Personalized CNNs,” explores innovative solutions to these challenges. The study, conducted by Sébastien Thuau, Siba Haidar, Ayush Bajracharya, and Rachid Chelouah, delves into federated learning, a promising approach that allows AI models to be trained across multiple local devices without sharing sensitive raw data. This method enhances privacy and can reduce network load.

The core of the research compares two distinct strategies for violence detection within a federated learning framework: Vision-Language Models (VLMs) fine-tuned with a technique called Low-Rank Adaptation (LoRA), and personalized 3D Convolutional Neural Networks (CNN3D). VLMs are powerful models that can process both visual and textual information, while CNNs are a type of neural network particularly effective for image and video analysis.

The researchers used LLaVA-7B, a VLM with billions of parameters, and a more compact CNN3D model with 65.8 million parameters as their representative cases. They evaluated these models not only on their accuracy in detecting violence but also on their calibration (how well their predictions match true probabilities) and, crucially, their energy consumption and carbon dioxide (CO2) emissions. The experiments were designed to simulate realistic conditions where data is not uniformly distributed across different surveillance locations (known as non-IID settings).

The findings were compelling. Both the LoRA-tuned VLMs and the personalized CNN3Ds achieved high accuracy, exceeding 90% in violence detection. Interestingly, the more compact CNN3D model slightly outperformed the LoRA-tuned VLMs in terms of ROC AUC (a measure of a model’s ability to distinguish between classes) and log loss (a measure of prediction accuracy and confidence), all while consuming significantly less energy. For instance, the CNN3D training consumed only 240 Watt-hours (Wh) and emitted 10.1 grams of CO2 equivalent, which is less than half the energy and CO2 footprint of the LoRA fine-tuning process (570 Wh and 24 grams CO2e).

However, the study also highlighted the unique strengths of VLMs. While more resource-intensive, VLMs remain highly favorable for tasks requiring contextual reasoning and multimodal inference—meaning they can understand complex scenarios by combining visual cues with descriptive prompts. This makes them valuable for situations that demand nuanced understanding beyond simple classification.

The authors propose a “hybrid model” as an optimal solution. This framework suggests using lightweight CNNs for routine violence classification tasks due to their efficiency and strong performance. For more complex or descriptive scenarios, where deeper contextual understanding is needed, VLMs could be selectively activated. This approach offers a balanced way to achieve responsible, resource-aware AI in video surveillance, with potential extensions for real-time, multimodal, and environmentally conscious systems.

Also Read:

This research is a significant step forward, being the first comparative study of its kind to emphasize energy efficiency and environmental metrics in federated violence detection using these two distinct AI model types. It provides a reproducible baseline for future work in sustainable and privacy-preserving AI for video surveillance. You can read the full paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Optimizing AI for Video Surveillance: Balancing Performance, Privacy, and Energy Efficiency

Gen AI News and Updates

Peking University Researchers Unveil Analog Chip Boosting AI Data Centers by Up to 1,000-Fold

A New Era for Spiking Neural Networks: Hyperdimensional Decoding Boosts Accuracy and Efficiency

Precision Screening for Diabetic Retinopathy Using Deep Ensembles

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates