TLDR: SiMa.ai has launched its next-generation Modalix platform, featuring a Machine Learning System on a Chip (MLSoC) designed to run complex generative AI models on power-constrained devices. This new hardware enables Large Language Models (LLMs) and multi-modal AI to operate at the edge while consuming less than 10 watts of power, marking a significant leap for industries like robotics, automotive, and healthcare. The platform aims to bridge the gap between powerful AI and real-world physical systems by providing a high-performance, energy-efficient solution for true on-device reasoning.
The chasm between sophisticated generative AI models and their deployment in power-constrained, real-world systems has been a significant barrier for the AI/ML community. SiMa.ai is making a bold move to bridge this gap with the launch of its next-generation Modalix platform. The company has released its Modalix Machine Learning System on a Chip (MLSoC) and an accompanying System-on-Module (SoM), engineered to execute reasoning-based Large Language Models (LLMs) and multi-modal generative AI workloads on-device, all while consuming less than 10 watts of power. For AI/ML professionals, this launch provides a new, power-efficient pathway to overcome previous edge computing constraints, enabling the deployment of complex AI reasoning in physical systems across robotics, automotive, and healthcare.
A New Power-to-Performance Benchmark for the Edge
The core challenge for deploying advanced AI at the edge has always been the trade-off between computational performance and power consumption. High-performance AI chips have historically been power-hungry, an untenable characteristic for battery-powered drones or thermally-sensitive industrial robots. SiMa.ai’s Modalix directly confronts this issue, delivering up to 50 TOPS of machine learning acceleration within a sub-10-watt power envelope. This efficiency, built on TSMC’s advanced N6 process technology, isn’t just an incremental improvement; it’s a fundamental enabler. It allows for the sustained operation of computationally intensive transformer models and LLMs, which were previously confined to the cloud. The platform’s support for mixed-precision data types like BF16 and INT8 further enhances performance, allowing models like Llama2 7B to run at speeds exceeding 10 tokens per second—a critical threshold for interactive applications.
For the AI Architect: Heterogeneous Compute and a Seamless Upgrade Path
Designing physical AI systems requires more than just a powerful ML accelerator. It demands a holistic approach to processing diverse data streams. The Modalix MLSoC is a heterogeneous compute platform, integrating a purpose-built Machine Learning Accelerator (MLA) with an 8-core Arm Cortex-A65 Application Compute Unit (ACU) for general-purpose tasks and a 4-core Synopsys Computer Vision Unit (CVU) for dedicated vision pipelines. This architecture allows AI architects to run complex, multi-modal applications—fusing vision, language, and other sensor data—on a single chip. Perhaps most strategically, the new Modalix System-on-Module (SoM) is designed to be pin-compatible with modules from leading GPU providers. This offers a simplified upgrade path for existing systems, dramatically reducing redesign costs and development time for teams looking to integrate next-generation AI capabilities into their hardware.
For the ML Engineer: Abstracting Complexity with a Software-First Approach
Powerful hardware is only as good as its software interface. SiMa.ai emphasizes a “software-first” philosophy with its Palette platform, designed to abstract away the underlying hardware complexity. For ML engineers, this means less time spent on manual, low-level optimization and more time focusing on model development and application logic. Palette supports standard frameworks like PyTorch, TensorFlow, and ONNX, and its integrated compiler automatically partitions and maps workloads across the MLSoC’s various compute engines. Furthermore, the platform utilizes a sophisticated streaming architecture that allows it to execute large models whose parameters exceed the on-chip memory, loading layers concurrently as others are being processed. This feature is crucial for deploying the large-scale generative models that are defining the next wave of AI.
The Dawn of True On-Device Reasoning
The launch of Modalix signals a pivotal shift from simple on-device inference (like object classification) to genuine on-device reasoning. By enabling LLMs and Large Multimodal Models (LMMs) to run locally, SiMa.ai empowers devices to not only perceive their environment but to understand, interact, and make decisions in real time. Imagine an in-vehicle assistant that can visually identify a landmark and hold a natural conversation about its history, all without cloud latency. This is the promise of “Physical AI”—intelligent, autonomous systems that are no longer just executing pre-programmed tasks but are actively reasoning about the world around them. With development kits now available, the tools to build this future are accessible. The next step will be for the AI/ML community to leverage this capability to create a new class of intelligent applications that were previously impossible, pushing the frontier of what can be achieved at the intelligent edge.
Also Read:


