spot_img
HomeNews & Current EventsOpenAI and NVIDIA Unveil New Open-Weight AI Models for...

OpenAI and NVIDIA Unveil New Open-Weight AI Models for Global Inference Infrastructure

TLDR: OpenAI, in partnership with NVIDIA, has launched two new open-weight AI reasoning models, gpt-oss-120b and gpt-oss-20b, aimed at democratizing advanced AI development. These models, trained on NVIDIA H100 GPUs and optimized for NVIDIA’s CUDA platform and Blackwell architecture, offer high-efficiency inference, reaching 1.5 million tokens per second on GB200 NVL72 systems. This collaboration underscores a commitment to open-source innovation and making AI accessible across various industries and scales globally.

OpenAI, in a significant collaboration with NVIDIA, has introduced two groundbreaking open-weight AI reasoning models, gpt-oss-120b and gpt-oss-20b. These models are designed to extend cutting-edge AI development capabilities to a broad spectrum of users, including developers, enthusiasts, enterprises, startups, and governments worldwide, spanning every industry and scale.

NVIDIA’s involvement in the release of these open models, gpt-oss-120b and gpt-oss-20b, highlights its pivotal role in fostering community-driven innovation and expanding global access to AI technologies. The models are versatile, enabling the development of breakthrough applications in generative AI, reasoning AI, physical AI, healthcare, and manufacturing, potentially unlocking new industries as the AI-driven industrial revolution progresses.

The new flexible, open-weight text-reasoning large language models (LLMs) from OpenAI were trained using NVIDIA H100 GPUs. For optimal inference performance, they are designed to run efficiently on the hundreds of millions of GPUs powered by the NVIDIA CUDA platform globally. These models are now available as NVIDIA NIM microservices, facilitating easy deployment on any GPU-accelerated infrastructure while ensuring flexibility, data privacy, and enterprise-grade security.

Further enhancing their performance, the models feature software optimizations for the NVIDIA Blackwell platform. When deployed on NVIDIA GB200 NVL72 systems, they achieve an impressive inference rate of 1.5 million tokens per second, significantly boosting efficiency for AI inference tasks.

Jensen Huang, founder and CEO of NVIDIA, commented on the collaboration, stating, “OpenAI showed the world what could be built on NVIDIA AI — and now they’re advancing innovation in open-source software.” He added, “The gpt-oss models let developers everywhere build on that state-of-the-art open-source foundation, strengthening U.S. technology leadership in AI — all on the world’s largest AI compute infrastructure.”

The article emphasizes that NVIDIA Blackwell is crucial for advanced reasoning, as the demand on compute infrastructure escalates with advanced reasoning models like gpt-oss generating exponentially more tokens. Blackwell architecture is purpose-built to meet this demand, offering the necessary scale, efficiency, and return on investment for high-level inference. Innovations within NVIDIA Blackwell include NVFP4 4-bit precision, which enables ultra-efficient, high-accuracy inference while substantially reducing power and memory requirements. This technology makes it feasible to deploy trillion-parameter LLMs in real-time, potentially generating billions of dollars in value for organizations.

For open development, NVIDIA CUDA stands as the world’s most widely available computing infrastructure, allowing users to deploy and run AI models across various platforms, from NVIDIA DGX Cloud to NVIDIA GeForce RTX and NVIDIA RTX PRO-powered PCs and workstations. With over 450 million NVIDIA CUDA downloads to date, the vast community of CUDA developers now gains access to these latest models, optimized for their existing NVIDIA technology stack.

OpenAI and NVIDIA’s commitment to open-source software is further demonstrated through their collaboration with leading open framework providers. They have provided model optimizations for FlashInfer, Hugging Face, llama.cpp, Ollama, and vLLM, in addition to NVIDIA Tensor-RT LLM and other libraries, offering developers flexibility in their framework choices.

Also Read:

This collaboration builds on a long history, dating back to 2016 when Jensen Huang personally delivered the first NVIDIA DGX-1 AI supercomputer to OpenAI’s headquarters. By optimizing OpenAI’s gpt-oss models for NVIDIA Blackwell and RTX GPUs, alongside NVIDIA’s extensive software stack, NVIDIA is facilitating faster, more cost-effective AI advancements for its 6.5 million developers across 250 countries, who utilize over 900 NVIDIA software development kits and AI models.

Dev Sundaram
Dev Sundaramhttps://blogs.edgentiq.com
Dev Sundaram is an investigative tech journalist with a nose for exclusives and leaks. With stints in cybersecurity and enterprise AI reporting, Dev thrives on breaking big stories—product launches, funding rounds, regulatory shifts—and giving them context. He believes journalism should push the AI industry toward transparency and accountability, especially as Generative AI becomes mainstream. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -