ENSI: Boosting Privacy-Preserving AI with Efficient Secure Inference for Large Language Models

TLDR: ENSI is a new framework that enables efficient and non-interactive secure inference for Large Language Models (LLMs) using homomorphic encryption. It addresses the challenges of privacy-preserving AI by co-designing cryptographic protocols with LLM architecture, specifically integrating the CKKS scheme with BitNet. Key innovations include optimized, multiplication-free matrix multiplications, a retraining-free sigmoid attention mechanism to replace complex softmax, and embedding the costly bootstrapping operation within RMSNorm to drastically reduce its frequency. Experimental results show significant speedups for core operations and a substantial reduction in bootstrapping overhead, while maintaining high accuracy comparable to plaintext inference.

Large Language Models (LLMs) like LLaMA and GPT have transformed artificial intelligence, offering personalized responses through services where users access powerful models via cloud APIs. However, this convenience comes with a significant privacy challenge: LLMs often process sensitive user data, and without robust security, this information could be inadvertently exposed.

This is where secure inference comes in. It’s a cutting-edge approach that allows computations on sensitive user data while keeping it encrypted, ensuring privacy. Two main cryptographic techniques are used for this: Secure Multi-Party Computation (SMPC) and Homomorphic Encryption (HE). While SMPC requires multiple rounds of communication, HE allows non-interactive computation, making it highly adaptable for distributed environments and offering strong privacy protection.

Despite its promise, applying Homomorphic Encryption to secure inference for large language models has been incredibly difficult. LLMs demand vast computational resources for high-dimensional matrix multiplications and complex self-attention mechanisms. Furthermore, the sophisticated activation functions commonly used in LLMs are notoriously hard to implement efficiently in HE environments. Traditional encoding strategies also add overhead, and the most time-consuming operation, ‘Bootstrapping’ (which refreshes encrypted data to prevent noise accumulation), occurs more frequently as models grow larger, creating significant bottlenecks.

Introducing ENSI: A Co-Designed Solution

A new research paper, “ENSI: Efficient Non-Interactive Secure Inference for Large Language Models”, introduces a novel framework called ENSI. This framework tackles these challenges by co-designing cryptographic protocols with the LLM architecture itself. ENSI integrates the RNS-CKKS homomorphic encryption scheme with BitNet, a lightweight LLM variant, to significantly reduce the computational complexity of encrypted operations.

Key Innovations for Efficient Secure Inference

ENSI brings several crucial innovations to make secure LLM inference practical:

Optimized Encoding and Matrix Multiplications: The framework uses an optimized encoding strategy that works seamlessly with BitNet. For ‘Plaintext-Ciphertext Matrix Multiplication’ (PCMM), where model weights are plaintext and user data is ciphertext, ENSI leverages BitNet’s ternary weights (values of -1, 0, or 1) to eliminate explicit multiplication operations, replacing them with much faster additions and subtractions. This results in an approximate 5.8 to 8 times speedup compared to state-of-the-art methods. For ‘Ciphertext-Ciphertext Matrix Multiplication’ (CCMM), which is crucial for attention mechanisms, ENSI introduces an innovative element extraction mechanism inspired by the ‘baby-step giant-step’ algorithm, drastically reducing the number of costly rotation operations.

Retraining-Free Secure Softmax Evaluation: The softmax function, vital for attention mechanisms, is a major computational hurdle under homomorphic encryption. Traditional methods either use computationally expensive high-degree polynomial approximations or require retraining the model with HE-friendly alternatives. ENSI pioneers the integration of the ‘Sigmoid Attention’ mechanism as a direct, retraining-free replacement for softmax. Sigmoid is simpler to encrypt and significantly reduces computational complexity.

Efficient Bootstrapping within RMSNorm: Bootstrapping is essential for refreshing ciphertexts but is extremely costly. ENSI cleverly embeds this operation within the ‘RMSNorm’ process, a normalization technique. By performing bootstrapping at a specific point during RMSNorm, ENSI reduces its frequency from being proportional to the embedding dimension to a constant, achieving the lowest bootstrapping frequency among existing schemes – accounting for just 1% of the total runtime, compared to over 60% in previous methods.

Performance and Accuracy

Experimental evaluations demonstrate ENSI’s significant performance advantages. Besides the matrix multiplication speedups, it achieves a 2.2 to 2.6 times speedup in softmax inference on a CPU. The framework was benchmarked on a LLaMA-3-700M model with 16 layers, processing 32 inputs of 2048 tokens – representing the largest known scale for secure inference to date. Despite these performance gains, ENSI maintains inference accuracy nearly comparable to plaintext inference across various datasets like PIQA, COPA, and SST.

Also Read:

Looking Ahead

While ENSI marks a significant leap forward in making privacy-preserving LLM inference more efficient and practical, large-scale ciphertext matrix multiplication remains a primary bottleneck. The researchers aim to further integrate dedicated hardware acceleration, such as GPUs, with the ENSI framework to achieve fully secure large model inference at even greater speeds in the future.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

ENSI: Boosting Privacy-Preserving AI with Efficient Secure Inference for Large Language Models

Introducing ENSI: A Co-Designed Solution

Key Innovations for Efficient Secure Inference

Performance and Accuracy

Looking Ahead

Gen AI News and Updates

OneShield Achieves Landmark Registration Under Cloud Security Alliance AI Controls Matrix, Setting New Industry Standard

TrojAI Unveils Defend for MCP to Bolster Security for AI Agent Workflows

OpenAI Unveils ‘Friendlier’ GPT-5.1 for ChatGPT, Emphasizing Enhanced User Experience and Adaptive Intelligence

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates