spot_img
HomeResearch & DevelopmentAI Synthesizes Images to Enhance Sub-Visible Particle Analysis

AI Synthesizes Images to Enhance Sub-Visible Particle Analysis

TLDR: A new research paper introduces a generative AI approach using diffusion models to overcome data imbalance in sub-visible particle (SvP) classification for pharmaceutical quality control. By synthesizing high-fidelity images of underrepresented particle types like silicone oil and air bubbles, the method significantly improves the performance of deep learning classifiers, making particle identification more accurate and scalable without extensive manual annotation.

Sub-visible particles (SvPs) are a major concern in protein-based therapeutics, as their presence can lead to adverse effects like immune responses and reduced drug effectiveness. Identifying and classifying these particles, such as distinguishing harmless silicone oil from potentially problematic protein particles, is crucial for pharmaceutical quality control. Flow imaging microscopy combined with deep learning has emerged as a powerful tool for this task, but it faces a significant hurdle: the scarcity of data and severe imbalance between different particle types in datasets.

Certain particle types, like silicone oil droplets and air bubbles, appear unintentionally and in much lower numbers compared to protein particles, for which large numbers of images are relatively easy to obtain. This imbalance often forces researchers to use less effective classification methods, limiting the full potential of multi-class deep neural networks.

To address this challenge, a recent research paper introduces a state-of-the-art approach using generative AI, specifically diffusion models, to synthesize high-fidelity images of these underrepresented particle types. This method aims to augment training datasets, enabling more effective training of multi-class deep neural networks for SvP classification. The researchers validate their approach by demonstrating that the generated samples closely resemble real particle images in terms of visual quality and structure.

How the AI Works: Diffusion Models

Generative diffusion models are a class of machine learning models that create new data by progressively transforming random noise into a structured, meaningful output. Imagine starting with a blurry, noisy image and gradually refining it step-by-step until a clear, realistic image emerges. This process involves two main steps: a ‘forward process’ where noise is gradually added to data, and a ‘reverse process’ where the model learns to remove this noise, effectively generating new data.

In this study, the diffusion models were trained on a small set of real images (just 1,000 for each minority class) of silicone oil and air bubbles. The models learned to generate new, realistic images of these particles, capturing their unique morphological features like the circular, semi-translucent nature of air bubbles or the distinct contours of silicone oil droplets.

A Two-Phase Approach to Better Classification

The research outlines a two-phase approach. In the first phase, the diffusion-based generative AI model is trained to synthesize images of underrepresented classes (silicone oil and air bubbles). In the second phase, a multi-class classifier is trained using a dataset augmented with these newly generated images. This augmentation balances the class distributions, enhancing data diversity and allowing the classifiers to learn more robustly.

Large-scale experiments were conducted using a validation dataset of 500,000 protein particle images and 500 images each of silicone oil and air bubbles, reflecting real-world class imbalance. The classification performance was evaluated using deep learning models like ResNet-18 and ResNet-50. The results showed consistent improvements in predictive performance when diffusion-generated images were added to the training datasets. For instance, in some configurations, macro precision improved by nearly a percentage point and the Area Under Precision-Recall Curve (AUPRC) saw gains of over 4 points, indicating better overall classification accuracy, especially for rare classes.

Beyond Classification: Identifying Mislabeled Data

Interestingly, the models trained with augmented data were so sensitive to subtle morphological differences that they sometimes identified images originally labeled as ‘protein particles’ that were more likely to be silicone oil droplets or air bubbles. This suggests that the AI models can even outperform original manual annotations in some cases, highlighting their potential to improve data quality itself.

Also Read:

Impact and Future Outlook

This research demonstrates that diffusion-based generative models offer an effective and scalable solution to the data imbalance problem in SvP classification. By reducing the dependence on labor-intensive manual annotation, this approach allows for the full utilization of modern multi-class classifiers. The framework is also highly adaptable and could be applied to other particle types, imaging modalities, or industrial quality control tasks where minority classes are difficult or expensive to annotate.

The authors emphasize the broader role of generative AI in pharmaceutical manufacturing and quality assurance, envisioning its potential to streamline quality control pipelines and support regulatory compliance in a scalable and data-driven manner. To promote open research and reproducibility, the diffusion models, trained classifiers, and sample datasets are publicly released. For more details, you can read the full paper here.

Ananya Rao
Ananya Raohttps://blogs.edgentiq.com
Ananya Rao is a tech journalist with a passion for dissecting the fast-moving world of Generative AI. With a background in computer science and a sharp editorial eye, she connects the dots between policy, innovation, and business. Ananya excels in real-time reporting and specializes in uncovering how startups and enterprises in India are navigating the GenAI boom. She brings urgency and clarity to every breaking news piece she writes. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -