Enhancing Large Language Models with Joint Embedding Predictive Architectures

TLDR: LLM-JEPA introduces a novel training objective for Large Language Models (LLMs) that adapts the successful Joint Embedding Predictive Architectures (JEPAs) from vision to language. By combining traditional generative loss with an embedding-space JEPA objective, LLM-JEPA significantly outperforms standard LLM training in finetuning and pretraining across various models and datasets, while also demonstrating robustness to overfitting and inducing structured representations.

Large Language Models (LLMs) have become central to many AI applications, but their training methods, primarily relying on input-space reconstruction and generative capabilities, differ significantly from successful approaches in computer vision. In vision, Joint Embedding Predictive Architectures (JEPAs) have shown superior performance by focusing on embedding-space training objectives. This difference has led researchers to question whether language models could benefit from vision-inspired training techniques.

A new research paper introduces LLM-JEPA, a pioneering solution that brings JEPA-style objectives to Large Language Models. This novel approach is applicable to both finetuning and pretraining LLMs, aiming to enhance their representation quality without sacrificing their generative abilities.

The core idea behind LLM-JEPA is to combine the standard LLM generative loss (which predicts the next token) with an additional JEPA objective. This JEPA component works by ensuring that different ‘views’ of the same underlying knowledge can be predicted from each other in the embedding space. For instance, in tasks involving both natural language and code, the text description and the corresponding code can be treated as two distinct views of the same concept. By learning to predict one view’s embedding from another, LLM-JEPA encourages the model to learn more abstract and robust representations.

The researchers empirically validated LLM-JEPA across a wide range of models, including families like Llama3, OpenELM, Gemma2, and Olmo, and numerous datasets such as NL-RX, GSM8K, Spider, and RottenTomatoes. The findings consistently show that LLM-JEPA significantly outperforms standard LLM training objectives. Beyond improved accuracy, the method also demonstrates remarkable robustness to overfitting, a common challenge in deep learning.

For example, in finetuning experiments, LLM-JEPA led to substantial accuracy gains across various models and datasets. In pretraining scenarios, it also improved the quality of learned representations, which then translated to better performance in downstream finetuning tasks. The paper also highlights that LLM-JEPA induces a more structured representation space, suggesting that it helps the model learn more meaningful and organized embeddings for text and code.

While LLM-JEPA offers significant advancements, the authors acknowledge a current limitation: the training process incurs a 3-fold increase in compute cost due to the need for multiple forward passes to obtain representations of different views. Future work aims to mitigate this by exploring methods to evaluate the LLM-JEPA loss within a single forward pass.

Also Read:

This research marks a crucial first step in adapting powerful vision-based self-supervised learning techniques to the realm of language models, promising more capable and robust AI systems. You can read the full research paper here: LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Enhancing Large Language Models with Joint Embedding Predictive Architectures

Gen AI News and Updates

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates