Optimizing AI on the Edge: A Deep Dive into Performance and Power Consumption

TLDR: This research paper evaluates the performance and energy consumption of various AI models (traditional ML, neural networks, and large language models) on different edge devices like Raspberry Pi, Intel Neural Compute Stick, NVIDIA Jetson Nano, and Google Coral USB. It highlights trade-offs between accuracy, inference time, power, and memory usage, demonstrating that specialized hardware and optimized frameworks significantly boost performance. The study also shows how tuning parameters like input size and batch size can further optimize energy efficiency, providing crucial insights for sustainable edge AI deployments.

The world of Artificial Intelligence (AI) is rapidly moving from centralized cloud data centers to smaller, more localized devices, a shift known as Edge AI. This change is driven by the need for real-time decision-making in applications like autonomous vehicles, robotics, and smart industries. Processing data directly on devices like IoT sensors and industrial equipment offers significant advantages, including reduced latency, enhanced privacy, improved bandwidth efficiency, and a more resilient, decentralized system.

The Challenge of Sustainability

Despite the growing adoption of AI on edge devices, a critical question remains: how sustainable are these AI inferences? With billions of AI-enabled edge devices expected to be deployed globally, their collective energy consumption is becoming a major concern. While advancements in hardware like System on Chips (SOCs) and AI accelerators have made it possible to run AI on resource-constrained devices, there’s been a lack of comprehensive studies on their performance and energy usage. This gap makes it difficult for developers to make informed decisions about which devices and models to choose for their applications.

A Comprehensive Study on Edge AI Performance

A recent research paper, titled “On the Sustainability of AI Inferences in the Edge,” by Ghazal Sobhani, Md. Monzurul Amin Ifath, Tushar Sharma, and Israat Haque from Dalhousie University, aims to fill this crucial knowledge gap. The study rigorously characterizes the performance of various AI models—including traditional machine learning (ML), neural networks (NN), and large language models (LLMs)—on widely used edge devices. These devices include the Raspberry Pi (RPi), Intel Neural Compute Stick (INCS), NVIDIA Jetson Nano (NJn), and Google Coral USB (GCU).

The researchers analyzed key trade-offs among model F1 score (a measure of accuracy), inference time (how quickly a model processes data), inference power (energy consumed during processing), and memory usage. They also explored how hardware and software optimizations, along with external parameter tuning, can balance model performance and resource usage for practical edge AI deployments.

Key Findings Across Devices and Models

The study revealed several important insights:

Domain-Specific Hardware and Software Boost Edge AI

While all devices showed similar F1 scores (accuracy) across different model types, their performance in terms of speed, power, and memory varied significantly. The NVIDIA Jetson Nano and Google Coral USB consistently delivered faster inference times, especially for more complex models, thanks to their specialized hardware (GPU and EdgeTPU, respectively). The Raspberry Pi, being a general-purpose device, was more energy-efficient for simpler tasks but lagged in speed and memory handling for demanding AI workloads. Optimized frameworks like NVIDIA’s TensorRT and Google’s EdgeTPU (used with LiteRT) played a crucial role in enhancing performance on their respective hardware.

Performance and Resource Usage Trade-offs

The Jetson Nano, particularly with TensorRT optimizations, emerged as the top performer, offering the best balance of speed and memory efficiency. The Raspberry Pi, while energy-efficient, struggled with more complex models due to its resource constraints. The Intel Neural Compute Stick and Google Coral USB provided a good balance, with Coral excelling in speed and NCS in power consumption for specific models. For large language models, the Jetson Nano could handle longer inputs efficiently, whereas the Raspberry Pi and Intel NCS were limited to shorter sequences. Notably, TinyBERT consistently outperformed Phi-2-orange in memory and power efficiency, making it a better choice for highly constrained environments.

Tuning Parameters for Better Energy Usage

The study also highlighted the importance of tuning inference parameters for deep and large models. For image processing tasks, increasing input image resolution improved accuracy but significantly increased inference time and power consumption, especially on devices like the Raspberry Pi. The Jetson Nano, with its powerful GPU, maintained stable performance even at higher resolutions. Similarly, optimal batch sizes (number of inputs processed simultaneously) varied by device, with Jetson Nano performing best with larger batches, and more constrained devices requiring smaller ones.

For large language models, inference efficiency was highly sensitive to input token length (number of words/pieces of text) and token window size (how much context the model considers). The Jetson Nano could handle longer contexts, while Raspberry Pi and Intel NCS were limited to shorter sequences. These findings underscore that careful, model-aware, and hardware-specific parameter tuning is essential for achieving responsive and sustainable edge AI inference.

Also Read:

Looking Ahead

This research provides valuable guidelines for selecting the right combination of AI models, edge devices, frameworks, and parameters for successful edge AI deployments. The authors have made their measurement scheme and associated scripts publicly available to support reproducibility and broader adoption. Future work aims to develop fully automated measurement and parameter tuning systems that can intelligently select optimal configurations based on user requirements, further advancing the field of sustainable Edge AI. To learn more about this research, you can access the full paper here: On the Sustainability of AI Inferences in the Edge.