Ensuring Safety: Replicating Machine Learning Models for Airborne Systems

TLDR: This research paper introduces a methodology for safely implementing Machine Learning (ML) models in airborne systems. It defines a Machine Learning Model Description (MLMD) as an unambiguous representation of a trained model (TFM) and proposes “semantics preservation” to ensure that the implemented Target Model (TIM) accurately replicates the TFM’s properties. The approach uses ML metrics, error margins, and different semantic levels (SL0-SL3) to verify replication. Using ONNX as an intermediate format and various code generators, the method was successfully applied to industrial helicopter avionics use cases, demonstrating that semantics can be preserved, especially with higher numerical precision, paving the way for ML certification in aviation.

The integration of Machine Learning (ML) into critical airborne systems presents a significant challenge: ensuring their safe and reliable operation. As aviation authorities like the European Union Aviation Safety Agency (EASA) and standards groups like EUROCAE/SAE (ED-324) develop guidance, the need for robust methods to certify ML-based systems becomes paramount. A recent research paper, “Implementation of airborne ML models with semantics preservation”, by Nicolas Valot, Louis Fabre, Benjamin Lesage, Ammar Mechouche, and Claire Pagetti, addresses this very issue by proposing a clear framework for replicating ML models in target airborne systems while preserving their intended behavior.

Bridging the Gap: MLMD and Semantics Preservation

The paper highlights a crucial distinction between an ML model as it’s designed and trained (referred to as the Training Framework Model, or TFM) and its unambiguous description, called the Machine Learning Model Description (MLMD). The MLMD acts as a vital bridge in the development process, ensuring that the final implemented model, known as the Target Model (TIM), accurately reflects the TFM’s behavior and properties.

This process is conceptualized within a “W-shaped” development life cycle. The first “V-cycle” focuses on designing and verifying the ML model’s intended function, resulting in the TFM. The second “V-cycle” then focuses on replicating this function in the TIM, with the MLMD guiding the implementation. A core concept here is “semantics preservation,” which means demonstrating that if the TFM satisfies certain safety and performance properties, the TIM must satisfy those same properties.

Quantifying Preservation: Metrics and Error Margins

To achieve semantics preservation, the researchers propose a methodology based on established ML metrics. For each critical property (like stability, generalization, performance, or robustness), a metric (M) is used to assess the model’s performance against a ground truth. The TFM is verified to meet an acceptable bound (RM) for each metric. The key innovation is to then derive an “error margin” (εM) and a “positive budget” (gM). If the TFM’s performance is sufficiently within its acceptable bound (RM – gM) and the difference between the TFM and TIM predictions is within εM, then the TIM is considered to have preserved the semantics.

This approach allows for a practical assessment: instead of re-verifying the TIM against the ground truth, engineers can verify that the TIM’s predictions are sufficiently close to the TFM’s predictions, within the calculated error margin. This significantly streamlines the certification process.

Levels of Detail: Semantic-Level Replication (SLx)

The paper further introduces different “semantic levels” (SLx) to describe the ML model with varying degrees of detail, influencing how precisely the TFM’s behavior must be replicated:

SL0 (Mathematical Notation): The most abstract level, defining operations purely mathematically.
SL1 (Machine Number Representation): Specifies operations using machine number types (e.g., 64-bit floating point or 8-bit integer), accounting for potential overflows.
SL2 (Operational Semantic Level): Decomposes tensor and matrix expressions into explicit scalar operations, defining their order and approximations, which is crucial for handling floating-point associativity issues.
SL3 (Execution Model Level): The most detailed level, mapping operations to specific hardware resources and defining rounding policies, considering the impact of compilers and hardware units.

Practical Implementation with ONNX and Code Generators

For practical implementation, the Open Neural Network eXchange (ONNX) format is chosen as the MLMD due to its widespread support in the ML community. The process involves exporting the TFM (e.g., from Keras) to an ONNX MLMD, which is then used by code generators (like onnx2c, acetone, or ANSYS Scade) to produce C code for the target embedded system. This generated code, along with its specific compilation and execution environment, forms the TIM.

Also Read:

Industrial Use Cases and Key Findings

The methodology was evaluated using two industrial helicopter avionics use cases: an `lstm` model for aircraft weight estimation and a `linear` model for computing aircraft loads. Both models were initially developed in Keras using FP32 (32-bit floating point) precision.

The experiments involved various configurations, testing different code generators, numerical precisions (from FP64 down to INT10), and execution environments (a Linux server and an embedded NXP T1042 PowerPC). The results showed that higher precision representations (FP64, FP32, FP16, INT16, INT14) generally preserved the TFM’s semantics within the defined error margins for both use cases. Notably, the C code generated for the embedded t1042 platform also successfully replicated the TFM’s semantics when using sufficient precision. However, lower precision formats like BF16, INT12, and INT10 often failed to meet the stringent semantic preservation requirements.

This research provides a robust and practical method for demonstrating the accurate replication of ML models in safety-critical airborne systems, offering a pathway to their certification by focusing on semantics preservation rather than costly bit-accurate replication.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Ensuring Safety: Replicating Machine Learning Models for Airborne Systems

Bridging the Gap: MLMD and Semantics Preservation

Quantifying Preservation: Metrics and Error Margins

Levels of Detail: Semantic-Level Replication (SLx)

Practical Implementation with ONNX and Code Generators

Industrial Use Cases and Key Findings

Gen AI News and Updates

Mplify Positions Network-as-a-Service (NaaS) as Core Infrastructure for Agentic AI Era

AWS Unveils New AI Certification and Enhanced Hands-On Learning to Bridge Skills Gap

Bananaz Unveils AI-Powered Design Agent to Revolutionize Mechanical Engineering

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates