spot_img
HomeResearch & DevelopmentAI-Powered System for Identifying Poor-Quality Astronomical Exposures

AI-Powered System for Identifying Poor-Quality Astronomical Exposures

TLDR: A new semi-supervised machine learning method, combining a Vision Transformer (ViT) and a k-Nearest Neighbor (kNN) classifier, has been developed to automatically detect poor-quality astronomical exposures in large imaging surveys like DECaLS. This approach efficiently identifies various types of problematic images, reducing the need for manual inspection and improving data quality for scientific analysis. The system successfully identified hundreds of previously missed bad exposures in the DESI Legacy Imaging Surveys.

As astronomical imaging surveys rapidly expand, the sheer volume of data makes traditional methods, like human visual inspection for identifying poor-quality images, increasingly impractical. To address this challenge, researchers have introduced a new machine-learning-based approach designed to automatically detect problematic exposures in large imaging surveys, with a particular focus on the DECam Legacy Survey (DECaLS).

A Smart Approach to Image Quality Control

The core of this innovative system is a semi-supervised pipeline that combines a Vision Transformer (ViT) with a k-Nearest Neighbor (kNN) classifier. This method leverages self-supervised learning (SSL) for pattern recognition and embedding generation, followed by supervised learning for classification. SSL is particularly powerful because it can uncover subtle, previously unknown features in data and reduce human subjectivity in the labeling process. It also offers adaptability, allowing the model to be fine-tuned on new observations without extensive manual labeling.

The pipeline utilizes a pre-trained ViT model, specifically the vit base model from the DINOv2 framework, which has been trained on a diverse set of natural images (ImageNet). This pre-training allows the model to develop a sophisticated understanding of image structures, which is then applied to astronomical images. The ViT generates high-dimensional ’embeddings’ – numerical representations of image features – which are then processed and fed into a kNN classifier. The kNN classifier assigns a label and a probability to each image based on its similarity to known ‘good’ and ‘bad’ exposures in the training set.

Identifying Diverse Image Problems

The system was trained and validated using a small set of labeled exposures from surveys conducted with the Dark Energy Camera (DECam). The researchers categorized bad exposures into 11 distinct types, including issues like ‘Saturated’ (overexposed), ‘Clouds Transparency’ (affected by atmospheric conditions), ‘PSF’ (Point Spread Function issues), ‘Ghost/Scatter’ (artifacts from bright sources), ‘Bad CCD’ (detector defects), ‘Noise’, ‘Telescope Moving / tracking failure’, and ‘Out of focus’. The dataset was carefully balanced to ensure effective training across these categories.

A key aspect of the pipeline is its ability to process individual CCD images within an exposure. Since a single DECam exposure can comprise 61 or 62 CCDs, the system randomly selects 20 CCD images from each exposure and uses a ‘voting consensus’ method to determine the overall quality of the exposure. This design balances efficiency with accuracy, especially for large-scale issues that might affect multiple CCDs.

Also Read:

Promising Results and Future Impact

The model demonstrated high classification performance, achieving over 80% accuracy for most categories of bad exposures. A clustering analysis showed that the model successfully learned to distinguish different patterns, separating ‘bad’ exposures into distinct clusters from ‘good’ ones. For instance, images with ‘Ghost Scatter’ issues were clearly clustered, indicating the model’s ability to identify unique features associated with these problems.

When applied to new imaging data for DECaLS Data Release 11 (DR11), the pipeline identified 780 problematic exposures. Many of these were previously missed by traditional data reduction pipelines, highlighting the value of this new machine learning approach. The newly identified bad exposures will be excluded from the final DR11 dataset, improving its overall quality for scientific analysis.

While the current pipeline still benefits from human verification of identified bad exposures, it significantly reduces the manual workload. Future improvements could involve processing entire exposures at once to better capture large-scale patterns, or using more advanced hierarchical learning models. This research represents a significant step towards automating quality control in astronomy, ensuring that future large-scale surveys, such as those from the Rubin Observatory, can efficiently deliver high-quality data for groundbreaking scientific discoveries. For more details, you can refer to the full research paper here.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -