NVIDIA and Google Cloud broaden AI infrastructure tie-up with high-density Rubin GPU systems

GOOGL

NVIDIA and Google Cloud announced an expanded collaboration focused on agentic and physical AI, unveiling new bare-metal instances and an expanded set of GPU-backed offerings. The companies introduced A5X bare-metal instances powered by NVIDIA Vera Rubin that can scale to 960,000 Rubin GPUs in a multisite cluster and up to 80,000 Rubin GPUs in a single-site configuration using ConnectX-9 SuperNICs and Google Virgo networking. The update also details additions to Google Cloud's NVIDIA Blackwell portfolio, new managed services, enterprise platform integrations, and partner software availability.

Key Points

NVIDIA and Google Cloud introduced A5X bare-metal instances powered by NVIDIA Vera Rubin that can scale to 960,000 Rubin GPUs in multisite clusters and to 80,000 Rubin GPUs within a single site using ConnectX-9 SuperNICs and Google Virgo networking.
The firms claim the new systems provide up to 10 times lower inference cost per token and 10 times higher token throughput per megawatt compared with the prior generation.
Google Cloud’s expanded NVIDIA Blackwell portfolio includes A4 VMs with NVIDIA HGX B200, rack-scale A4X VMs with GB200 NVL72, A4X Max with GB300 NVL72, and fractional G4 VMs with NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs; enterprise and partner software integrations were also announced.

NVIDIA and Google Cloud revealed an extension of their technology partnership intended to accelerate agentic and physical AI workloads. Announced at Google Cloud Next in Las Vegas, the firms described new infrastructure options and software integrations meant to scale large inference and training workloads.

A5X bare-metal instances and cluster scale

The centerpiece of the announcement is the introduction of NVIDIA Vera Rubin-powered A5X bare-metal instances. According to the companies, these instances can be aggregated into multisite clusters that scale up to 960,000 NVIDIA Rubin GPUs. Within a single site, the A5X configuration, which pairs NVIDIA ConnectX-9 SuperNICs with Google Virgo networking, can scale to as many as 80,000 NVIDIA Rubin GPUs.

The release notes performance improvements relative to the prior generation, stating the new systems deliver up to 10 times lower inference cost per token and 10 times higher token throughput per megawatt.

Expanded Blackwell portfolio on Google Cloud

Google Cloud’s NVIDIA Blackwell portfolio was outlined in detail. It includes:

A4 virtual machines built on NVIDIA HGX B200 systems
Rack-scale A4X VMs that use NVIDIA GB200 NVL72 systems
A4X Max configurations built on NVIDIA GB300 NVL72 systems
Fractional G4 VMs powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs

Customers and workloads

The companies said some large-scale inference workloads for OpenAI run on NVIDIA GB300 and GB200 NVL72 systems on Google Cloud, including support for ChatGPT. Thinking Machines Lab is scaling its Tinker API on A4X Max VMs using GB300 NVL72 systems. In addition, Google Gemini models running on NVIDIA Blackwell and Blackwell Ultra GPUs are now in preview on Google Distributed Cloud.

Another cloud offering highlighted is Confidential G4 VMs with NVIDIA RTX PRO 6000 Blackwell GPUs, which the companies describe as the first confidential computing cloud offering using NVIDIA Blackwell GPUs.

Platform and software integrations

The announcement also named several platform and software advances. NVIDIA Nemotron 3 Super is now available on the Gemini Enterprise Agent Platform. Google Cloud and NVIDIA introduced a managed reinforcement learning API built with NVIDIA NeMo RL designed to accelerate training at scale.

Cybersecurity vendor CrowdStrike is cited as using NVIDIA NeMo open libraries to create synthetic data and fine-tune Nemotron and other open large language models for security applications. CrowdStrike’s workloads run on Managed Training Clusters on the Gemini Enterprise Agent Platform with NVIDIA Blackwell GPUs.

Design and engineering tools from Cadence and Siemens Digital Industries Software are listed as available on Google Cloud, accelerated on NVIDIA AI infrastructure. In addition, NVIDIA Omniverse libraries and the NVIDIA Isaac Sim robotics simulation framework are offered on Google Cloud Marketplace.

Recognition

As part of the announcement, NVIDIA was recognized by Google Cloud with Partner of the Year awards in two categories: AI Global Technology Partner and Infra Modernization Compute.

Implications

The expanded set of instance types, the high-density Rubin GPU clusters, and the software integrations are presented as an effort to lower inference operating cost and raise energy- and token-efficiency for large-scale AI workloads across cloud and distributed environments.

Risks

Adoption and integration risk - customers and partners must adapt workloads and toolchains to the new instance types and platform integrations, which could affect migration timelines and cloud operating plans; this impacts cloud infrastructure and enterprise AI deployments.
Workload suitability and cost realization - while the announcement cites up to 10 times lower inference cost per token and higher token throughput per megawatt, actual savings depend on specific workloads and utilization, affecting AI compute buyers and service operators.
Availability and preview limitations - several offerings are described as being in preview or newly available on certain platforms, which may constrain immediate enterprise-wide deployment and affect engineering, simulation, and cybersecurity applications relying on managed training clusters and distributed cloud.

Menu

Key Points

Risks

More from Stock Markets