Boosting DNN Efficiency: The Power of Joint Memory and Computing Frequency Adjustments

TLDR: This research investigates how simultaneously adjusting both memory and computing frequencies on resource-constrained devices can significantly improve energy efficiency and reduce inference time for Deep Neural Networks (DNNs). Moving beyond traditional methods that focus solely on computing frequency, the paper proposes and validates a model-based, data-driven approach for joint frequency scaling. Simulation results demonstrate that this combined optimization can lead to substantial energy savings in both local and cooperative DNN inference scenarios, offering a more effective balance between performance and power consumption on edge devices.

Deep neural networks (DNNs) are everywhere, from image recognition to intelligent content generation. However, deploying these complex models on devices with limited resources, like those found at the edge of networks, often leads to significant challenges: high latency and substantial energy consumption. Traditionally, researchers have tackled these issues using a technique called Dynamic Voltage and Frequency Scaling (DVFS), which primarily adjusts the computing frequency of processors to balance performance and energy use.

However, a crucial aspect often overlooked is the adjustment of memory frequency. This paper highlights that memory frequency also plays a significant role in how quickly a DNN can perform inference and how much energy it consumes. The authors argue that by jointly scaling both memory and computing frequencies, a more energy-efficient DNN inference can be achieved.

The research begins by investigating the combined impact of memory and computing frequency scaling on inference time and energy consumption. They use a method that blends model-based analysis with real-world data. By fitting parameters from various DNN models, they provide a preliminary analysis of their proposed model, demonstrating the effects of adjusting both frequencies simultaneously. Their simulations, covering both local and cooperative inference scenarios, further validate that this joint scaling effectively reduces device energy consumption.

One of the key findings is that memory frequency scaling alone can lead to a significant reduction in average inference time. For instance, on a Jetson Xavier NX, VGG19 saw a 74% reduction, while ResNet152 experienced a 59% reduction. This underscores the importance of considering memory frequency, which has often been neglected in previous studies.

The paper introduces a novel formulation for inference time that accounts for both memory and computing frequencies. This model, backed by real-world data from devices like Jetson TX1 and Jetson Orin Nano, shows high precision with R-squared values greater than 0.99. They also analyze power consumption, revealing that both memory and computing frequencies exhibit an approximate cubic relationship with power, with computing frequency generally having a stronger impact.

In practical simulations, the researchers compared three policies for frequency adjustment: computing-prior (fixing computing frequency to maximum and adjusting memory), memory-prior (fixing memory frequency to maximum and adjusting computing), and joint scaling (adjusting both simultaneously). For local inference on a Jetson TX1, the joint scaling policy achieved an average energy reduction of 10% for VGG19 and 23% for DenseNet121 compared to the other policies. While the benefits were sometimes limited on devices like the Jetson Orin Nano due to restricted frequency ranges, the overall conclusion is that joint scaling offers more opportunities for energy reduction while meeting performance deadlines.

Even in cooperative inference scenarios, where tasks can be offloaded to an edge server, joint scaling proves beneficial. When deadlines are tight and local inference becomes necessary, the joint scaling policy can significantly reduce energy consumption. However, if the deadline is extremely close to the minimum inference time a device can achieve, the gains from joint scaling might become less pronounced.

Also Read:

In conclusion, this paper provides a comprehensive study into the synergy between memory and computing frequencies for DNN inference. By formulating a realistic model of inference time and energy consumption based on real-world data, the authors demonstrate that jointly scaling these two frequencies is a powerful strategy for achieving energy-efficient DNN inference on resource-constrained edge devices. For more technical details, you can refer to the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Boosting DNN Efficiency: The Power of Joint Memory and Computing Frequency Adjustments

Gen AI News and Updates

UC Irvine Introduces Master’s Program in Applied AI for Scientists to Bridge Industry Skill Gaps

Advanced AI Maps Critical Road Networks for Disaster Response

Ensuring AI Safety: A Look at Runtime Monitoring for Deep Neural Networks

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates