spot_img
HomeResearch & DevelopmentMeasuring How AI Embeddings Combine Meanings: A New Evaluation...

Measuring How AI Embeddings Combine Meanings: A New Evaluation Framework

TLDR: Researchers developed a two-step evaluation framework to quantify additive compositionality in AI embeddings (word, sentence, knowledge graph). They found that embeddings across different models and training stages exhibit significant linear compositionality, allowing them to generalize to new combinations. However, the framework also identifies limitations where linear composition breaks down, pointing to areas for future non-linear research.

Understanding how artificial intelligence models combine basic units of meaning to form complex ideas is crucial for their ability to generalize and interpret novel expressions. This concept, known as compositionality, is at the heart of how language models process information. A recent research paper, titled “Quantifying Compositionality of Classic and State-of-the-Art Embeddings,” delves into this very challenge, proposing a robust framework to measure how well different types of AI embeddings exhibit additive compositionality.

The study, authored by Zhijin Guo, Chenhao Xue, Zhaozhen Xu, Hongbo Bo, Yuxuan Ye, Janet B. Pierrehumbert, and Martha Lewis, highlights a long-standing debate in AI. Early static word embeddings like Word2vec made strong claims about their compositional nature, often demonstrated by simple analogies like “king – man + woman = queen.” However, these claims faced criticism for being overly simplistic. On the other hand, modern generative transformer models (like BERT, GPT, and Llama) offer immense flexibility but often lack clear boundaries on how context can shift meaning, potentially obscuring their compositional structure.

A Two-Step Approach to Quantifying Compositionality

To address this, the researchers formalized a two-step, generalized evaluation pipeline. This method is designed to be modality-agnostic, meaning it can be applied to various types of embeddings, including words, sentences, and knowledge graphs.

The first step focuses on **quantifying linearity**. This involves measuring the linear relationship between known attributes of entities (e.g., demographic information for users, concepts in a sentence, or morphological features of a word) and their corresponding embeddings. This is achieved using Canonical Correlation Analysis (CCA), a statistical method that identifies and quantifies shared information between two sets of variables.

The second step, **quantifying additive generalization**, assesses whether these linear components can be combined to predict embeddings for unseen attribute combinations. This is done through a “Leave-One-Out” (LOO) experiment. In this setup, the model learns to associate attributes with embeddings from a subset of data, then attempts to reconstruct the embedding for a left-out entity based on its attributes. The accuracy of this reconstruction is measured using metrics like L2 loss (reconstruction error), cosine similarity (alignment between predicted and actual embeddings), and retrieval accuracy (how well the predicted embedding identifies the correct entity).

Experiments Across Diverse Data Modalities

The framework was rigorously applied to three distinct data modalities:

  • Sentence Embeddings: The study evaluated SBERT, GPT, and Llama models using sentences annotated with concepts from the Schema-Guided Dialogue (SGD) dataset. This allowed them to see if sentence meanings could be additively decomposed into their constituent concepts.
  • Knowledge Graph Embeddings: Using the MovieLens 1M dataset, user embeddings (derived from movie preferences) were analyzed against demographic attributes (gender, age, occupation) to see if these attributes composed linearly within the user embeddings.
  • Word Embeddings: Word2vec embeddings were examined for their ability to capture both semantic (using WordNet) and morphological (using MorphoLex) information, specifically looking at how roots and suffixes combine.

Key Findings and Insights

The experiments yielded several important insights:

  • Across all modalities, a significant linear correlation was found between embeddings and their semantic features, confirming the foundational assumption for compositionality.
  • Additive generalization was consistently observed. For instance, sentence embeddings could be reconstructed for unseen concept combinations, and user embeddings generalized additive relationships to new user attributes. Word2vec embeddings also showed decomposition into root and suffix combinations.
  • Compositionality signals increased during training stages for models like MultiBERT and knowledge graph embeddings, indicating that models learn more compositional structures over time.
  • Interestingly, in transformer-based models like SBERT, compositionality generalization increased through earlier layers, peaking around layers 4 or 5, but then showed an abrupt decline in the final layer. This suggests that later layers might specialize in task-specific representations, potentially moving away from purely additive compositional structures.

Also Read:

Understanding Compositional Failures and Future Directions

Beyond successful cases, the framework also quantifies instances where additive compositionality breaks down. These “failure cases” are crucial as they highlight the limitations of linear composition and point to semantic phenomena that require more complex, non-linear interactions or context-dependent meanings. For example, fluctuations in retrieval accuracy across transformer layers suggest challenges in accurately representing concepts in natural language.

The researchers emphasize that while current models retain a surprising degree of additive compositional structure, there are consistent residuals that signal unresolved semantic complexities. These findings underscore the need for future research into more expressive, non-linear approaches to compositional representation.

This work provides a unified and statistically robust diagnostic for evaluating compositionality, offering valuable opportunities for improving the interpretability of representation learning in AI. For more details, you can read the full paper here.

Ananya Rao
Ananya Raohttps://blogs.edgentiq.com
Ananya Rao is a tech journalist with a passion for dissecting the fast-moving world of Generative AI. With a background in computer science and a sharp editorial eye, she connects the dots between policy, innovation, and business. Ananya excels in real-time reporting and specializes in uncovering how startups and enterprises in India are navigating the GenAI boom. She brings urgency and clarity to every breaking news piece she writes. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -