NVIDIA and Google Cloud revealed an extension of their technology partnership intended to accelerate agentic and physical AI workloads. Announced at Google Cloud Next in Las Vegas, the firms described new infrastructure options and software integrations meant to scale large inference and training workloads.
A5X bare-metal instances and cluster scale
The centerpiece of the announcement is the introduction of NVIDIA Vera Rubin-powered A5X bare-metal instances. According to the companies, these instances can be aggregated into multisite clusters that scale up to 960,000 NVIDIA Rubin GPUs. Within a single site, the A5X configuration, which pairs NVIDIA ConnectX-9 SuperNICs with Google Virgo networking, can scale to as many as 80,000 NVIDIA Rubin GPUs.
The release notes performance improvements relative to the prior generation, stating the new systems deliver up to 10 times lower inference cost per token and 10 times higher token throughput per megawatt.
Expanded Blackwell portfolio on Google Cloud
Google Cloud’s NVIDIA Blackwell portfolio was outlined in detail. It includes:
- A4 virtual machines built on NVIDIA HGX B200 systems
- Rack-scale A4X VMs that use NVIDIA GB200 NVL72 systems
- A4X Max configurations built on NVIDIA GB300 NVL72 systems
- Fractional G4 VMs powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs
Customers and workloads
The companies said some large-scale inference workloads for OpenAI run on NVIDIA GB300 and GB200 NVL72 systems on Google Cloud, including support for ChatGPT. Thinking Machines Lab is scaling its Tinker API on A4X Max VMs using GB300 NVL72 systems. In addition, Google Gemini models running on NVIDIA Blackwell and Blackwell Ultra GPUs are now in preview on Google Distributed Cloud.
Another cloud offering highlighted is Confidential G4 VMs with NVIDIA RTX PRO 6000 Blackwell GPUs, which the companies describe as the first confidential computing cloud offering using NVIDIA Blackwell GPUs.
Platform and software integrations
The announcement also named several platform and software advances. NVIDIA Nemotron 3 Super is now available on the Gemini Enterprise Agent Platform. Google Cloud and NVIDIA introduced a managed reinforcement learning API built with NVIDIA NeMo RL designed to accelerate training at scale.
Cybersecurity vendor CrowdStrike is cited as using NVIDIA NeMo open libraries to create synthetic data and fine-tune Nemotron and other open large language models for security applications. CrowdStrike’s workloads run on Managed Training Clusters on the Gemini Enterprise Agent Platform with NVIDIA Blackwell GPUs.
Design and engineering tools from Cadence and Siemens Digital Industries Software are listed as available on Google Cloud, accelerated on NVIDIA AI infrastructure. In addition, NVIDIA Omniverse libraries and the NVIDIA Isaac Sim robotics simulation framework are offered on Google Cloud Marketplace.
Recognition
As part of the announcement, NVIDIA was recognized by Google Cloud with Partner of the Year awards in two categories: AI Global Technology Partner and Infra Modernization Compute.
Implications
The expanded set of instance types, the high-density Rubin GPU clusters, and the software integrations are presented as an effort to lower inference operating cost and raise energy- and token-efficiency for large-scale AI workloads across cloud and distributed environments.