TLDR: Cerebras Systems has launched Qwen3-235B, an Alibaba-developed frontier AI model, on its inference cloud platform. Announced on July 8, 2025, the model boasts an unprecedented speed of 1,500 tokens per second and full 131K context support, making it ideal for complex code generation and reasoning tasks at a fraction of the cost of comparable closed-source models.
Paris, France – Cerebras Systems, a leader in accelerating generative AI, announced on July 8, 2025, the official launch of Qwen3-235B on its inference cloud platform. This groundbreaking development, unveiled at the RAISE Summit conference in Paris, introduces Alibaba’s Qwen3-235B as a frontier AI model optimized for unparalleled speed and extensive context support, poised to redefine enterprise AI deployment.
The Qwen3-235B model, leveraging Cerebras’ innovative Wafer Scale Engine (WSE), achieves an astounding output speed of 1,500 tokens per second. This dramatically reduces reasoning times from typical minutes to mere fractions of a second, specifically 0.6 seconds, making complex coding, advanced reasoning, and deep-RAG (Retrieval Augmented Generation) workflows nearly instantaneous. According to independent tests by Artificial Analysis, Cerebras is currently the only company globally offering a frontier AI model capable of generating output at over 1,000 tokens per second, setting a new industry benchmark for real-time AI performance.
A significant enhancement with this launch is the quadrupling of context length support from 32K to a full 131K tokens, the maximum supported by Qwen3-235B. This expanded context window is critical for production-grade applications, allowing the model to process dozens of files and tens of thousands of lines of code simultaneously. This capability is particularly impactful for tasks such as code refactoring, documentation, and bug detection, transforming what was once considered a ‘toy’ into a robust enterprise platform.
Cost-efficiency is another cornerstone of this release. Qwen3-235B, built on an efficient mixture-of-experts (MoE) architecture, is offered at a highly competitive price point of $0.60 per million input tokens and $1.20 per million output tokens. This represents less than one-tenth the cost of comparable closed-source models, such as OpenAI’s o3 reasoning model, which is priced at $2 per million input tokens and $8 per million output tokens.
Andrew Feldman, CEO and Founder of Cerebras Systems, emphasized the market demand for such capabilities. “We’re seeing huge demand from developers for frontier models with long context, especially for code generation,” Feldman stated. “Qwen3-235B on Cerebras is our first model that stands toe-to-toe with frontier models like Claude 4 and DeepSeek R1. And with full 131K context, developers can now use Cerebras on production-grade coding applications and get answers back in less than a second instead of waiting for minutes on GPUs.”
Industry analysts echo this sentiment. One analyst noted, “If you can do a large context window, which is important for coding and agentic AI, if you can do that for one-tenth the price, I think you’ve got something that’s going to make a difference.”
Also Read:
- RAISE Summit 2025: Enterprise AI Infrastructure Takes Center Stage in Paris
- Alibaba Unveils WebSailor: An Open-Source Web AI Agent Setting New Benchmarks
Beyond the model launch, Cerebras also announced strategic partnerships with key industry players including Notion, DataRobot, Docker, and Hugging Face, further expanding its ecosystem. Notion, for instance, is already leveraging Cerebras’ inference technology to power instant, enterprise-scale document search for its Notion AI for Work offering, delivering results in under 300 milliseconds for its over 100 million users.


