spot_img
HomeResearch & DevelopmentEnsuring Safety: Replicating Machine Learning Models for Airborne Systems

Ensuring Safety: Replicating Machine Learning Models for Airborne Systems

TLDR: This research paper introduces a methodology for safely implementing Machine Learning (ML) models in airborne systems. It defines a Machine Learning Model Description (MLMD) as an unambiguous representation of a trained model (TFM) and proposes “semantics preservation” to ensure that the implemented Target Model (TIM) accurately replicates the TFM’s properties. The approach uses ML metrics, error margins, and different semantic levels (SL0-SL3) to verify replication. Using ONNX as an intermediate format and various code generators, the method was successfully applied to industrial helicopter avionics use cases, demonstrating that semantics can be preserved, especially with higher numerical precision, paving the way for ML certification in aviation.

The integration of Machine Learning (ML) into critical airborne systems presents a significant challenge: ensuring their safe and reliable operation. As aviation authorities like the European Union Aviation Safety Agency (EASA) and standards groups like EUROCAE/SAE (ED-324) develop guidance, the need for robust methods to certify ML-based systems becomes paramount. A recent research paper, “Implementation of airborne ML models with semantics preservation”, by Nicolas Valot, Louis Fabre, Benjamin Lesage, Ammar Mechouche, and Claire Pagetti, addresses this very issue by proposing a clear framework for replicating ML models in target airborne systems while preserving their intended behavior.

Bridging the Gap: MLMD and Semantics Preservation

The paper highlights a crucial distinction between an ML model as it’s designed and trained (referred to as the Training Framework Model, or TFM) and its unambiguous description, called the Machine Learning Model Description (MLMD). The MLMD acts as a vital bridge in the development process, ensuring that the final implemented model, known as the Target Model (TIM), accurately reflects the TFM’s behavior and properties.

This process is conceptualized within a “W-shaped” development life cycle. The first “V-cycle” focuses on designing and verifying the ML model’s intended function, resulting in the TFM. The second “V-cycle” then focuses on replicating this function in the TIM, with the MLMD guiding the implementation. A core concept here is “semantics preservation,” which means demonstrating that if the TFM satisfies certain safety and performance properties, the TIM must satisfy those same properties.

Quantifying Preservation: Metrics and Error Margins

To achieve semantics preservation, the researchers propose a methodology based on established ML metrics. For each critical property (like stability, generalization, performance, or robustness), a metric (M) is used to assess the model’s performance against a ground truth. The TFM is verified to meet an acceptable bound (RM) for each metric. The key innovation is to then derive an “error margin” (εM) and a “positive budget” (gM). If the TFM’s performance is sufficiently within its acceptable bound (RM – gM) and the difference between the TFM and TIM predictions is within εM, then the TIM is considered to have preserved the semantics.

This approach allows for a practical assessment: instead of re-verifying the TIM against the ground truth, engineers can verify that the TIM’s predictions are sufficiently close to the TFM’s predictions, within the calculated error margin. This significantly streamlines the certification process.

Levels of Detail: Semantic-Level Replication (SLx)

The paper further introduces different “semantic levels” (SLx) to describe the ML model with varying degrees of detail, influencing how precisely the TFM’s behavior must be replicated:

  • SL0 (Mathematical Notation): The most abstract level, defining operations purely mathematically.
  • SL1 (Machine Number Representation): Specifies operations using machine number types (e.g., 64-bit floating point or 8-bit integer), accounting for potential overflows.
  • SL2 (Operational Semantic Level): Decomposes tensor and matrix expressions into explicit scalar operations, defining their order and approximations, which is crucial for handling floating-point associativity issues.
  • SL3 (Execution Model Level): The most detailed level, mapping operations to specific hardware resources and defining rounding policies, considering the impact of compilers and hardware units.

Practical Implementation with ONNX and Code Generators

For practical implementation, the Open Neural Network eXchange (ONNX) format is chosen as the MLMD due to its widespread support in the ML community. The process involves exporting the TFM (e.g., from Keras) to an ONNX MLMD, which is then used by code generators (like onnx2c, acetone, or ANSYS Scade) to produce C code for the target embedded system. This generated code, along with its specific compilation and execution environment, forms the TIM.

Also Read:

Industrial Use Cases and Key Findings

The methodology was evaluated using two industrial helicopter avionics use cases: an `lstm` model for aircraft weight estimation and a `linear` model for computing aircraft loads. Both models were initially developed in Keras using FP32 (32-bit floating point) precision.

The experiments involved various configurations, testing different code generators, numerical precisions (from FP64 down to INT10), and execution environments (a Linux server and an embedded NXP T1042 PowerPC). The results showed that higher precision representations (FP64, FP32, FP16, INT16, INT14) generally preserved the TFM’s semantics within the defined error margins for both use cases. Notably, the C code generated for the embedded t1042 platform also successfully replicated the TFM’s semantics when using sufficient precision. However, lower precision formats like BF16, INT12, and INT10 often failed to meet the stringent semantic preservation requirements.

This research provides a robust and practical method for demonstrating the accurate replication of ML models in safety-critical airborne systems, offering a pathway to their certification by focusing on semantics preservation rather than costly bit-accurate replication.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -