AI-Powered System for Identifying Poor-Quality Astronomical Exposures

TLDR: A new semi-supervised machine learning method, combining a Vision Transformer (ViT) and a k-Nearest Neighbor (kNN) classifier, has been developed to automatically detect poor-quality astronomical exposures in large imaging surveys like DECaLS. This approach efficiently identifies various types of problematic images, reducing the need for manual inspection and improving data quality for scientific analysis. The system successfully identified hundreds of previously missed bad exposures in the DESI Legacy Imaging Surveys.

As astronomical imaging surveys rapidly expand, the sheer volume of data makes traditional methods, like human visual inspection for identifying poor-quality images, increasingly impractical. To address this challenge, researchers have introduced a new machine-learning-based approach designed to automatically detect problematic exposures in large imaging surveys, with a particular focus on the DECam Legacy Survey (DECaLS).

A Smart Approach to Image Quality Control

The core of this innovative system is a semi-supervised pipeline that combines a Vision Transformer (ViT) with a k-Nearest Neighbor (kNN) classifier. This method leverages self-supervised learning (SSL) for pattern recognition and embedding generation, followed by supervised learning for classification. SSL is particularly powerful because it can uncover subtle, previously unknown features in data and reduce human subjectivity in the labeling process. It also offers adaptability, allowing the model to be fine-tuned on new observations without extensive manual labeling.

The pipeline utilizes a pre-trained ViT model, specifically the vit base model from the DINOv2 framework, which has been trained on a diverse set of natural images (ImageNet). This pre-training allows the model to develop a sophisticated understanding of image structures, which is then applied to astronomical images. The ViT generates high-dimensional ’embeddings’ – numerical representations of image features – which are then processed and fed into a kNN classifier. The kNN classifier assigns a label and a probability to each image based on its similarity to known ‘good’ and ‘bad’ exposures in the training set.

Identifying Diverse Image Problems

The system was trained and validated using a small set of labeled exposures from surveys conducted with the Dark Energy Camera (DECam). The researchers categorized bad exposures into 11 distinct types, including issues like ‘Saturated’ (overexposed), ‘Clouds Transparency’ (affected by atmospheric conditions), ‘PSF’ (Point Spread Function issues), ‘Ghost/Scatter’ (artifacts from bright sources), ‘Bad CCD’ (detector defects), ‘Noise’, ‘Telescope Moving / tracking failure’, and ‘Out of focus’. The dataset was carefully balanced to ensure effective training across these categories.

A key aspect of the pipeline is its ability to process individual CCD images within an exposure. Since a single DECam exposure can comprise 61 or 62 CCDs, the system randomly selects 20 CCD images from each exposure and uses a ‘voting consensus’ method to determine the overall quality of the exposure. This design balances efficiency with accuracy, especially for large-scale issues that might affect multiple CCDs.

Also Read:

Promising Results and Future Impact

The model demonstrated high classification performance, achieving over 80% accuracy for most categories of bad exposures. A clustering analysis showed that the model successfully learned to distinguish different patterns, separating ‘bad’ exposures into distinct clusters from ‘good’ ones. For instance, images with ‘Ghost Scatter’ issues were clearly clustered, indicating the model’s ability to identify unique features associated with these problems.

When applied to new imaging data for DECaLS Data Release 11 (DR11), the pipeline identified 780 problematic exposures. Many of these were previously missed by traditional data reduction pipelines, highlighting the value of this new machine learning approach. The newly identified bad exposures will be excluded from the final DR11 dataset, improving its overall quality for scientific analysis.

While the current pipeline still benefits from human verification of identified bad exposures, it significantly reduces the manual workload. Future improvements could involve processing entire exposures at once to better capture large-scale patterns, or using more advanced hierarchical learning models. This research represents a significant step towards automating quality control in astronomy, ensuring that future large-scale surveys, such as those from the Rubin Observatory, can efficiently deliver high-quality data for groundbreaking scientific discoveries. For more details, you can refer to the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

AI-Powered System for Identifying Poor-Quality Astronomical Exposures

A Smart Approach to Image Quality Control

Identifying Diverse Image Problems

Promising Results and Future Impact

Gen AI News and Updates

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates