Efficient and Private: A New Approach to Fine-Tuning Vision Transformers for Encrypted Images

TLDR: Researchers propose a novel low-rank adaptation method for fine-tuning Vision Transformers (ViT) that enables privacy-preserving image classification. Unlike existing methods like MeLo, their approach keeps the patch embedding layer trainable, which is crucial for handling encrypted images. This method significantly reduces the number of trainable parameters (to 0.8% of full fine-tuning) while maintaining or even improving accuracy on encrypted data, making ViT training more efficient and privacy-conscious.

In the rapidly evolving world of artificial intelligence, Vision Transformers (ViT) have emerged as powerful tools for various image-related tasks, from classification to object detection. However, training these sophisticated models often involves a process called “full fine-tuning,” which means retraining all the model’s parameters. This approach can be incredibly demanding on computational resources and time, especially with the increasing size of pre-trained models.

To address this challenge, researchers have explored “low-rank adaptation” methods. One such method, LoRA, was initially developed for large language models and later extended to ViT models as MeLo. These methods aim to significantly reduce the number of trainable parameters by freezing most of the pre-trained model’s weights and injecting only a small number of new, trainable components.

A critical area where privacy is paramount is image classification, particularly when dealing with sensitive visual information. To protect this data, images can be “perceptually encrypted.” While ViT models trained with encrypted data have been studied for privacy preservation, it wasn’t clear if existing low-rank adaptation methods like MeLo were effective in this specific scenario.

Researchers Haiwei Lin, Shoko Imaizumi, and Hitoshi Kiya from Chiba University and Tokyo Metropolitan University have identified a limitation: MeLo, in its original form, is not effective when training models with perceptually encrypted data. This is because MeLo freezes certain parts of the ViT architecture, specifically the patch and position embedding layers, which are crucial when dealing with pixel-level permutations introduced by encryption.

To overcome this, they propose a novel low-rank adaptation method designed specifically for privacy-preserving learning. Their approach extends MeLo by making a key modification: unlike conventional methods, their proposed method does not freeze the patch embedding layer. Instead, trainable rank decomposition matrices are injected into each layer of the ViT architecture, and the patch embedding layer is also allowed to be updated during fine-tuning.

The encryption method used in their study is a block-wise image encryption technique. This involves dividing an image into non-overlapping blocks (matching the ViT’s patch size) and then randomly shuffling pixels within each block across the color channels using a secret key. This ensures that the visual information is protected while still allowing the model to learn from the encrypted data.

The experimental results, conducted on the CIFAR-10 dataset with a ViT pretrained on ImageNet-1K, demonstrate the effectiveness of their new method. While full fine-tuning achieved high accuracy on both plain and encrypted images, it required a massive 82.56 million trainable parameters. MeLo, on the other hand, had a very small parameter count (0.15 million) but showed a significant drop in accuracy on encrypted images (90.05% compared to 96.16% for full fine-tuning).

The proposed method strikes an impressive balance. It achieved an accuracy of 96.35% on encrypted images, slightly outperforming full fine-tuning, while requiring only 0.71 million trainable parameters. This is a remarkable reduction, representing approximately 0.8% of the parameters needed for full fine-tuning. This efficiency makes it a highly practical solution for privacy-preserving image classification.

Also Read:

In conclusion, this research introduces a significant advancement in fine-tuning Vision Transformers for privacy-preserving applications. By intelligently adapting low-rank methods to account for encrypted data, the proposed technique offers a way to maintain high accuracy with substantially fewer computational resources. For more details, you can read the full research paper available at arXiv.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Efficient and Private: A New Approach to Fine-Tuning Vision Transformers for Encrypted Images

Gen AI News and Updates

AI Models Learn to Predict Polymer Properties from Images and Text

The Fading Footprints: How Fine-Tuning Impacts Knowledge Edits in Language Models

Understanding How Robots Learn from Large Vision Models: Insights from the GrinningFace Benchmark

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates