Optimizing AI on Tiny Devices: A Deep Dive into Split Learning Performance

TLDR: This research explores Split Learning (SL) for running deep learning models on ultra-low-power IoT devices. It benchmarks a MobileNetV2 model split across two ESP32-S3 boards, comparing wireless communication protocols (ESP-NOW, BLE, UDP, TCP). The study finds that splitting the model after ‘block_16_project_BN’ yields a small intermediate tensor (5.66 kB). ESP-NOW offers the lowest overall round-trip time (3.7s) due to minimal setup, while UDP provides the fastest transmission latency (1.4ms). BLE and TCP show higher latencies due to MTU limits and overheads, respectively. The paper provides empirical data for efficient TinyML + SL deployments.

In the rapidly evolving world of Artificial Intelligence, a significant challenge lies in deploying powerful deep learning models on tiny, resource-constrained devices like those found in the Internet of Things (IoT). These devices, often operating on minimal power, have tight memory and processing limitations that make direct execution of complex AI tasks difficult.

A promising solution to this challenge is Split Learning (SL), a technique where a deep learning model is divided into parts. The initial layers of the model run on the low-power sensor device, while the remaining, more computationally intensive layers are offloaded to a companion device, such as another microcontroller, a gateway, or a nearby edge server. This approach helps to preserve data privacy and reduce bandwidth usage by only exchanging intermediate data (activations) between devices.

Despite its potential, the performance of split learning, especially concerning the impact of low-power wireless communication protocols, has remained largely unexplored on constrained microcontrollers. To address this gap, a recent experimental study built the first end-to-end TinyML + SL testbed using Espressif ESP32-S3 boards. The goal was to benchmark the over-the-air performance of split learning TinyML in real-world edge/IoT environments.

The researchers utilized a MobileNetV2 image recognition model, a lightweight convolutional neural network, which was quantized to 8-bit integers to further reduce its size and computational demands. This model was then partitioned and delivered to the ESP32-S3 nodes using over-the-air updates. A crucial aspect of the study involved testing different wireless communication methods for exchanging the intermediate activations between the split model parts. These methods included ESP-NOW, Bluetooth Low Energy (BLE), and traditional UDP/IP and TCP/IP, allowing for a direct comparison on identical hardware.

The study revealed several key insights into the performance of split learning. Measurements showed that splitting the MobileNetV2 model after the ‘block_16_project_BN’ layer was particularly effective. This split point generated a compact 5.66 kB tensor of intermediate data, which could traverse the wireless link very quickly. When using UDP, this data transfer took only 3.2 ms, leading to a steady-state round-trip latency of 5.8 seconds. This was a significant improvement, being more than 20 times faster than sending the entire raw image to a remote server for full inference.

Among the communication protocols, ESP-NOW demonstrated the most favorable overall round-trip time (RTT) performance, achieving 3.7 seconds. This is largely due to its minimal setup time (around 48 ms) and efficient peer-to-peer architecture that bypasses the complexities of an IP stack, making it ideal for low-latency applications with small data transfers. While ESP-NOW has a smaller maximum packet size (250 bytes), which can increase transmission delay for larger payloads, its overall efficiency for RTT was superior.

UDP, a connectionless protocol, achieved the lowest transmission latency for intermediate activations (as low as 1.4 ms with 2 packets for the ‘Block 15 project layer’ split). However, it lacks built-in reliability. TCP, while offering reliable data delivery, incurred higher latency due to its inherent overheads like connection setup (a three-way handshake), acknowledgments, and retransmissions. Bluetooth Low Energy (BLE), despite its energy efficiency, significantly increased latency, stretching beyond 10 seconds. This was primarily attributed to its limited data rate and a smaller Maximum Transmission Unit (MTU) of 512 bytes, leading to more packet fragmentation and overhead during transmission.

The research also highlighted that the computational workload on the first IoT device (Device 1), which handles the initial part of the model and interfaces with the camera, was significantly higher in terms of inference time (3053.75 ms) compared to the second device (Device 2), which primarily performed classification tasks (437.0 ms).

Also Read:

In conclusion, this experimental study provides valuable empirical evidence for implementing split learning on ultra-low-power edge/IoT nodes. It demonstrates that careful selection of the model split point and the communication protocol can drastically impact the end-to-end latency. The findings suggest that ESP-NOW is highly efficient for low-latency communication of small data transfers on ESP32 devices, while UDP offers the lowest transmission latency. The choice of protocol, however, should always consider trade-offs in reliability, data size, and energy consumption. This work lays a foundation for future dynamic and adaptive frameworks for split TinyML inference, aiming to optimize performance based on real-time network conditions and device resources. You can find more details about this research paper here: An Experimental Study of Split-Learning TinyML on Ultra-Low-Power Edge/IoT Nodes.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Optimizing AI on Tiny Devices: A Deep Dive into Split Learning Performance

Gen AI News and Updates

Advanced AI Maps Critical Road Networks for Disaster Response

Streamlining Person Re-Identification for Edge Devices with One-Shot Knowledge Transfer

SCARE: A New Model for Accurate and Efficient Sensor Calibration on IoT Devices

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates