Enhancing AI Trust and Speed: A New Approach to Early Exit Neural Networks

TLDR: SPEED is a novel framework for Deep Neural Networks (DNNs) that integrates Early Exits (EE) with Selective Prediction (SP) to significantly improve both inference latency and trustworthiness. It introduces Deferral Classifiers (DCs) at each layer to identify ‘hard’ samples that might lead to overconfident, incorrect predictions, deferring them to an expert. This method reduces the risk of wrong predictions by 50% and achieves a 2.05x speedup, while also demonstrating strong robustness to domain changes across various NLP and image classification tasks.

Deep Neural Networks (DNNs) have become indispensable in many applications, from image recognition to natural language processing. However, their deployment in critical areas, such as medical diagnosis or autonomous driving, faces two significant hurdles: the time it takes for them to make a prediction (inference latency) and their trustworthiness. A major concern for trustworthiness is when DNNs are overly confident about a wrong prediction, a phenomenon known as overconfidence.

To tackle the latency issue, a technique called Early Exit (EE) DNNs was developed. These networks are designed to allow simpler data samples to exit from intermediate layers if the model is sufficiently confident in its prediction. This avoids processing every sample through the entire network, thereby saving computational resources and speeding up inference.

However, EE DNNs inherit the overconfidence problem from standard DNNs. In fact, they can be even more susceptible to it, especially in earlier layers where the model has less refined information. If a model exits early based on a false sense of confidence, it can lead to incorrect and untrustworthy decisions. Traditional Selective Prediction (SP) methods, which allow a model to abstain from making a prediction when uncertain, often fall short here because they typically rely on confidence scores that can be misleading, failing to detect instances of ‘fake confidence’.

Introducing SPEED: A Novel Approach

To address these critical challenges, researchers Divya Jyoti Bajpai and Manjesh Kumar Hanawal have proposed a new framework called SPEED: Selective Prediction for Early Exit DNNs. This innovative approach combines the efficiency of Early Exits with a refined Selective Prediction strategy to enhance both the accuracy and speed of DNNs.

The core of SPEED lies in its use of ‘Deferral Classifiers’ (DCs) at each intermediate layer of the DNN. Unlike existing methods that primarily rely on the model’s confidence score, DCs are designed to assess the ‘hardness’ of a sample. If a DC identifies a sample as hard – meaning the model might be confused or, more critically, exhibiting fake confidence – it defers that sample to an ‘expert’. This expert could be a more powerful, larger model or even a human expert, ensuring that challenging cases receive appropriate attention rather than being misclassified with high certainty by the early layers of the DNN.

If a sample is deemed ‘easy’ by the DC, it then proceeds to the Exit Classifier (EC) at that layer. If the EC is sufficiently confident, the sample exits with a prediction. Otherwise, it is passed to the next layer, and the process repeats. This intelligent routing prevents unnecessary computation for easy samples and, crucially, avoids erroneous high-confidence predictions for hard ones.

How SPEED Learns and Performs

A key aspect of SPEED is its unique training strategy for DCs. The training data is categorized into ‘easy’ and ‘hard’ samples based on their average true class confidence across all layers of a pre-trained EEDNN. This allows DCs to learn specific patterns associated with samples that are genuinely difficult or those that tend to generate fake confidence. By training DCs separately, SPEED maintains the optimal performance of the main DNN backbone while improving its robustness to changes in data distribution (domain shifts).

The benefits of SPEED are significant. The research demonstrates that it can reduce the risk of wrong predictions by 50% and achieve an impressive 2.05 times speedup compared to a standard DNN that processes all samples through its final layer. This efficiency gain is due to both early exiting for easy samples and early deferral for hard ones, preventing wasted computational resources.

Furthermore, SPEED proves to be robust to domain shifts. This means that a model trained on one type of data (e.g., movie reviews) can perform well on a different but related domain (e.g., hotel reviews) without significant retraining. This generalization capability is vital for real-world applications where data distributions can vary.

The paper also provides a theoretical analysis, establishing conditions under which the error rate of the DCs can guarantee that the overall model’s risk remains below a specified threshold, further solidifying the method’s reliability.

Also Read:

Experimental Validation

The effectiveness of SPEED was rigorously tested across various tasks, including sentiment classification, entailment classification, and natural language inference (NLI) using datasets like SST-2, IMDB, Yelp, MNLI, and SNLI. It was also evaluated on image classification tasks using CIFAR-10 and Caltech-256 datasets. In all experiments, SPEED consistently outperformed existing baselines, showcasing its superior ability to manage risk and improve inference speed, both within the original training domain and when applied to new, unseen domains.

In conclusion, SPEED offers a promising solution for deploying trustworthy and efficient DNNs in sensitive applications. By intelligently identifying and handling samples that might lead to overconfident errors, it ensures that AI systems ‘know what they don’t know’, leading to more reliable and faster decision-making. You can find more details about this research in the paper: Know What You Don’t Know: Selective Prediction for Early Exit DNNs.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Enhancing AI Trust and Speed: A New Approach to Early Exit Neural Networks

Introducing SPEED: A Novel Approach

How SPEED Learns and Performs

Experimental Validation

Gen AI News and Updates

UC Irvine Introduces Master’s Program in Applied AI for Scientists to Bridge Industry Skill Gaps

Ensuring AI Safety: A Look at Runtime Monitoring for Deep Neural Networks

Enhancing Power Grid Optimization Proxies with Constraint-Informed Active Learning

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates