TLDR: Autoencoding Probabilistic Circuits (APCs) introduce a new method for learning data representations. Unlike traditional neural autoencoders that struggle with missing data and complex probabilistic inference, APCs use a Probabilistic Circuit (PC) as an encoder to explicitly model data and embeddings. This allows for exact inference and native handling of missing information. Combined with a neural decoder, APCs demonstrate superior reconstruction quality and generate robust embeddings for downstream tasks, even with significant data corruption. They can also effectively distill knowledge from other models without needing original training data.
In the rapidly evolving landscape of machine learning, the ability to learn compact and expressive representations of data is fundamental. This process, known as representation learning, involves automatically discovering and encoding informative, low-dimensional features or embeddings from raw data. These embeddings are crucial for various tasks, from classification and generation to retrieval. Autoencoders, particularly those based on neural networks, have played a pivotal role in this domain, enabling the discovery of latent data features that power applications like large language models and image retrieval.
However, despite their widespread success, neural network-based autoencoders, including their probabilistic extensions like Variational Autoencoders (VAEs), face significant challenges. One major hurdle is their inherent difficulty in handling missing data. When inputs are incomplete, neural networks typically require computationally intensive imputation methods or unprincipled heuristics to fill in the gaps, which can introduce biases and affect performance. Furthermore, performing exact probabilistic inference, such as computing marginals or conditionals, remains a complex and often intractable task for these models.
Introducing Autoencoding Probabilistic Circuits (APCs)
A new framework, Autoencoding Probabilistic Circuits (APCs), emerges to address these limitations. APCs leverage the unique strengths of Probabilistic Circuits (PCs), a special kind of computational graph known for enabling exact and tractable inference. Unlike prior approaches that relied on external neural embeddings or activation-based encodings, APCs explicitly model probabilistic embeddings by extending PCs to jointly model both data and embeddings.
The core innovation of APCs lies in their PC encoder, which allows for native handling of arbitrary missing data. This means that even if parts of the input data are missing, the PC encoder can still infer a complete embedding without the need for external imputation. This is achieved through tractable probabilistic inference, where embeddings are obtained by sampling from the true conditional posterior distribution of the PC.
A Hybrid Architecture for Enhanced Performance
APCs seamlessly integrate this powerful PC encoder with a neural network decoder. This hybrid architecture combines the best of both worlds: the PC’s ability for principled, tractable probabilistic encoding and the neural network’s capacity for modeling complex, non-linear mappings and efficient decoding. The entire system is end-to-end trainable, thanks to an improved differentiable sampling procedure that leverages novel advances in gradient estimation, specifically using a technique called SIMPLE.
During training, APCs optimize a loss function that includes three key components: a reconstruction term to ensure accurate data reconstruction, an embedding prior regularization term to keep the learned embeddings meaningful, and a joint data-embedding likelihood regularization term that maximizes the joint probability of the data and their inferred embeddings. This comprehensive objective guides the model to learn robust and informative representations.
Also Read:
- A New Framework for Universal Tabular Data Embeddings
- Enhancing Graph Learning with External Knowledge and Latent Space Constraints
Empirical Validation and Key Advantages
Extensive empirical evaluations demonstrate that APCs outperform existing PC-based autoencoding methods and neural autoencoders in several critical areas. They show superior reconstruction quality, especially when dealing with increasing levels of missing data. While neural autoencoders quickly degrade in performance as data corruption increases, APCs maintain lower reconstruction error and preserve key structural elements, even at high levels of missingness or with structured missing patterns.
Beyond reconstruction, the explicit probabilistic embeddings learned by APCs yield high-quality, informative representations. These embeddings prove highly beneficial for downstream tasks like classification, maintaining strong performance even when inputs are incomplete. Visualizations of the embedding spaces confirm that APCs preserve clear class separation even under severe data corruption, a stark contrast to neural encoder embeddings which tend to collapse.
Furthermore, APCs exhibit promising capabilities in data-free knowledge distillation. They can act as effective student models, learning from pre-trained generative latent variable models like VAEs without requiring access to the original training data. This not only transfers generative knowledge but also enhances the student APC’s robustness to missing data compared to the teacher model.
The framework’s components, including differentiable sampling, the neural decoder, embedding regularization, and joint likelihood regularization, each contribute distinctly and complementarily to its overall superior performance and robustness. This work highlights APCs as a powerful and flexible representation learning method that effectively exploits the probabilistic inference capabilities of PCs, opening promising directions for robust inference, out-of-distribution detection, and knowledge distillation.
To delve deeper into the technical details and comprehensive experimental results, you can read the full research paper here.


