TLDR: This research benchmarks Parameter-Efficient Fine-Tuning (PEFT) methods (LoRA, DoRA, GaLore) on convolutional neural networks (CNNs) for edge devices. It finds that while PEFT reduces computational costs significantly (up to 95% FLOPs reduction), memory efficiency varies, especially for depthwise-separable convolutions. LoRA generally offers the best balance of accuracy and resource efficiency, while GaLore provides robust accuracy with higher computational demands. The study provides insights for selecting PEFT methods based on hardware constraints and application needs for on-device AI updates.
The world of artificial intelligence is rapidly expanding, with powerful deep learning models becoming increasingly common. While large language models (LLMs) often grab headlines, there’s a growing need to deploy these intelligent systems on smaller, resource-constrained devices, often referred to as “edge devices.” Think of your smartphone, smart home devices, or even industrial sensors – these are all examples of edge devices. The challenge lies in efficiently updating these models on-device, given their limited memory and processing power.
A promising solution to this challenge is Parameter-Efficient Fine-Tuning (PEFT). Traditionally, updating a deep learning model involves adjusting millions, or even billions, of parameters, which is computationally intensive and memory-hungry. PEFT methods aim to drastically reduce these costs by only updating a small fraction of the model’s parameters, or by making updates in a more efficient way. While PEFT has been extensively studied and proven effective for large language models, its application to smaller models, particularly convolutional neural networks (CNNs) commonly used on edge devices, has been less explored.
This research paper dives deep into benchmarking and analyzing popular PEFT methods when applied to CNN architectures designed for resource-constrained edge environments. The authors specifically evaluate three prominent PEFT techniques: LoRA (Low-Rank Adaptation), DoRA (Weight-Decomposed Low-Rank Adaptation), and GaLore (Gradient Low-Rank Projection). They compare these methods against traditional full fine-tuning (FFT) and a simpler approach called head-only fine-tuning with batch-normalization (BN+H).
The study focuses on how these PEFT methods perform when updating standard and depthwise convolutional architectures. Depthwise-separable convolutions (DSCs) are particularly important for edge devices because they significantly reduce the computational cost of inference. The researchers used advanced PyTorch profilers to measure key performance indicators: peak memory usage and floating point operations (FLOPs), which indicate computational cost. They also assessed the accuracy of the updated models.
One of the key findings is that while PEFT methods are highly memory-efficient for LLMs, their efficiency is somewhat reduced when applied to depthwise-separable convolution architectures. For these models, the memory required for storing intermediate “activations” during processing becomes a major bottleneck, limiting the overall memory savings. However, when targeting standard convolutional architectures optimized for edge deployment, adapter-based PEFT methods like LoRA and DoRA can dramatically cut down FLOPs during model updates—by as much as 95%. This means significantly less computation is needed for updates.
The paper also highlights interesting trade-offs between the different PEFT methods. LoRA generally offers a good balance between accuracy and resource consumption, especially for standard CNNs like ResNet-18, where it achieved up to 67% peak memory reduction compared to full fine-tuning. However, for depthwise convolution models, LoRA’s memory reduction was less pronounced. DoRA, while offering similar inference benefits to LoRA, introduced a higher memory overhead during training due to its more complex computational process.
GaLore, another method evaluated, showed consistent accuracy comparable to full fine-tuning across various models and tasks. However, it tends to be more computationally expensive than LoRA and DoRA, particularly due to its reliance on Singular Value Decomposition (SVD) for gradient updates, which can introduce a FLOPs overhead. The study also found that GaLore was not as memory-efficient as LoRA for the CNNs evaluated, contrary to some findings in LLMs.
The research also explored the impact of the “rank” hyperparameter, which essentially determines the learning capacity of the PEFT method. Surprisingly, higher ranks didn’t always lead to better accuracy, especially when the pre-trained model already performed well on a task. In some cases, lower ranks provided a smoother optimization landscape, leading to better results. Conversely, for tasks where the pre-trained model performed poorly, GaLore, with its full-rank weight updates, often outperformed LoRA and DoRA, suggesting that adapter-based methods might struggle with significant shifts from the pre-training objective at very low ranks.
Also Read:
- A Unified LLM Approach for Complex Interactive Applications
- Compressing Knowledge: A New Approach to Self-Supervised Dataset Distillation
In summary, this study provides crucial guidance for developers and researchers working on AI for edge devices. It underscores that the choice of PEFT method should be carefully considered based on the specific hardware constraints, desired performance, and application needs. While LoRA often provides the best balance of efficiency and accuracy, especially for standard CNNs, GaLore offers more robust accuracy at the cost of higher computational demands. DoRA, in these edge-optimized CNN scenarios, did not show a significant advantage over LoRA. This work is a valuable step towards making advanced AI models more accessible and adaptable on the myriad of devices at the “edge” of our networks. You can find more details in the full research paper available here: From LLMs to Edge: Parameter-Efficient Fine-Tuning on Edge Devices.


