TLDR: This research investigates how simultaneously adjusting both memory and computing frequencies on resource-constrained devices can significantly improve energy efficiency and reduce inference time for Deep Neural Networks (DNNs). Moving beyond traditional methods that focus solely on computing frequency, the paper proposes and validates a model-based, data-driven approach for joint frequency scaling. Simulation results demonstrate that this combined optimization can lead to substantial energy savings in both local and cooperative DNN inference scenarios, offering a more effective balance between performance and power consumption on edge devices.
Deep neural networks (DNNs) are everywhere, from image recognition to intelligent content generation. However, deploying these complex models on devices with limited resources, like those found at the edge of networks, often leads to significant challenges: high latency and substantial energy consumption. Traditionally, researchers have tackled these issues using a technique called Dynamic Voltage and Frequency Scaling (DVFS), which primarily adjusts the computing frequency of processors to balance performance and energy use.
However, a crucial aspect often overlooked is the adjustment of memory frequency. This paper highlights that memory frequency also plays a significant role in how quickly a DNN can perform inference and how much energy it consumes. The authors argue that by jointly scaling both memory and computing frequencies, a more energy-efficient DNN inference can be achieved.
The research begins by investigating the combined impact of memory and computing frequency scaling on inference time and energy consumption. They use a method that blends model-based analysis with real-world data. By fitting parameters from various DNN models, they provide a preliminary analysis of their proposed model, demonstrating the effects of adjusting both frequencies simultaneously. Their simulations, covering both local and cooperative inference scenarios, further validate that this joint scaling effectively reduces device energy consumption.
One of the key findings is that memory frequency scaling alone can lead to a significant reduction in average inference time. For instance, on a Jetson Xavier NX, VGG19 saw a 74% reduction, while ResNet152 experienced a 59% reduction. This underscores the importance of considering memory frequency, which has often been neglected in previous studies.
The paper introduces a novel formulation for inference time that accounts for both memory and computing frequencies. This model, backed by real-world data from devices like Jetson TX1 and Jetson Orin Nano, shows high precision with R-squared values greater than 0.99. They also analyze power consumption, revealing that both memory and computing frequencies exhibit an approximate cubic relationship with power, with computing frequency generally having a stronger impact.
In practical simulations, the researchers compared three policies for frequency adjustment: computing-prior (fixing computing frequency to maximum and adjusting memory), memory-prior (fixing memory frequency to maximum and adjusting computing), and joint scaling (adjusting both simultaneously). For local inference on a Jetson TX1, the joint scaling policy achieved an average energy reduction of 10% for VGG19 and 23% for DenseNet121 compared to the other policies. While the benefits were sometimes limited on devices like the Jetson Orin Nano due to restricted frequency ranges, the overall conclusion is that joint scaling offers more opportunities for energy reduction while meeting performance deadlines.
Even in cooperative inference scenarios, where tasks can be offloaded to an edge server, joint scaling proves beneficial. When deadlines are tight and local inference becomes necessary, the joint scaling policy can significantly reduce energy consumption. However, if the deadline is extremely close to the minimum inference time a device can achieve, the gains from joint scaling might become less pronounced.
Also Read:
- Optimizing Edge AI Decisions: A Two-Threshold Approach for Cost-Sensitive Classification
- Adaptive Vision: AI’s Leap Towards Human-like Perception
In conclusion, this paper provides a comprehensive study into the synergy between memory and computing frequencies for DNN inference. By formulating a realistic model of inference time and energy consumption based on real-world data, the authors demonstrate that jointly scaling these two frequencies is a powerful strategy for achieving energy-efficient DNN inference on resource-constrained edge devices. For more technical details, you can refer to the full research paper here.


