How does Ironwood compare to NVIDIA H200 and B200?

Ironwood targets inference workloads at hyperscale. Per-chip peak FP8 compute (4,614 TFLOPs) is competitive with NVIDIA Blackwell B200 (~5,000 TFLOPs FP8); HBM (192 GB) is comparable. Ironwood's biggest advantage is pod-scale interconnect — up to 9,216 chips and 1.77 PB of shared HBM in a single pod. NVIDIA's edge is broader software ecosystem (CUDA), wider model coverage, and availability. Most large labs in 2026 use both.

Why does Ironwood matter for Gemini 4?

Gemini 4's 10-million-token context window and native multimodal generation need a serving fabric with massive HBM and very high-bandwidth interconnect. Ironwood gives Google ~6x the per-chip HBM of Trillium (TPU v6e), 4.5x the HBM bandwidth, and a pod fabric that lets one inference job span thousands of chips. Without Ironwood, Gemini 4 at scale isn't economic for Google.

Can I use Ironwood for my workloads?

Yes, via Google Cloud. Ironwood (TPU7x) became generally available in the weeks after its November 2025 announcement. You access it through Google Kubernetes Engine (GKE) — TPU7x requires GKE — and you write models in JAX (TensorFlow is not supported on TPU7x). It's part of the AI Hypercomputer architecture on Google Cloud. Pricing is competitive with NVIDIA H200 cloud instances for inference at scale.

Quick Answer

What Is Ironwood TPU? Google's 7th-Gen AI Chip (May 2026)

Q: What is Ironwood TPU?

Ironwood, also called TPU7x, is Google's seventh-generation Tensor Processing Unit, purpose-built for high-volume, low-latency AI inference and model serving. Each chip delivers 4,614 TFLOPs at FP8, 192 GB of HBM at 7.37 TB/s, and pods scale to 9,216 chips connected via 9.6 Tb/s ICI networking — sharing 1.77 petabytes of HBM. It's the silicon powering Gemini 4, the model Google unveiled at I/O 2026 on May 19.

Published: May 19, 2026

What Is Ironwood TPU? Google’s 7th-Gen AI Chip (May 2026)

Ironwood (TPU7x) is Google’s seventh-generation Tensor Processing Unit — the silicon powering Gemini 4 and Google’s “age of inference” strategy. Here’s what it does, the headline numbers, and how it compares to NVIDIA.

Last verified: May 19, 2026

Quick facts

Property	Value
Vendor	Google
Codename	Ironwood
Official name	TPU7x (7th-generation TPU)
Unveiled	Google Cloud Next, April 2025
General availability	Weeks after announcement (November 2025)
Per-chip FP8 compute	4,614 TFLOPs
Per-chip HBM	192 GB
Per-chip HBM bandwidth	~7.37 TB/s
Pod scale	Up to 9,216 chips
Inter-chip Interconnect (ICI)	9.6 Tb/s
Pod-scale shared HBM	1.77 PB
Improvement vs TPU v5p	10x peak performance
Improvement vs Trillium (v6e)	4x per-chip for training + inference
Energy	~2x perf-per-watt vs Trillium; ~30x vs first Cloud TPU
Powers	Gemini 4, Gemini Intelligence, Google Search AI Mode

What Ironwood is for

Google calls this the age of inference. Earlier TPU generations were optimised for training. Ironwood is purpose-built for high-volume, low-latency inference at hyperscale — serving Gemini 4 to Search, the Gemini app, Android, Vertex AI, and everything else.

Key design choices:

Inference-first — optimised for serving, not just training.
Native FP8 — first TPU with FP8 in the Matrix Multiply Units, ~2x BF16 throughput.
Massive HBM — 192 GB per chip lets large models stay resident without sharding.
Pod-scale interconnect — 9,216 chips in one pod, sharing 1.77 PB of HBM.
SparseCore enhancements — specialized accelerators for huge embeddings (recommendations, ranking).

Architecture in plain terms

Component	What it is
Chip	Dual-chiplet — each chiplet has 1 TensorCore, 2 SparseCores, 96 GB HBM
TensorCore	Main matrix math; native FP8 MXUs
SparseCore	Embedding / sparse workload accelerator
HBM	192 GB per chip @ 7.37 TB/s — large models fit without partitioning
ICI	9.6 Tb/s inter-chip interconnect; 200 GBps per axis (3D torus topology)
Pod	9,216 chips fully interconnected, 1.77 PB shared HBM

How it compares (May 2026)

	Google Ironwood (TPU7x)	NVIDIA H200	NVIDIA B200 (Blackwell)
Vendor	Google	NVIDIA	NVIDIA
FP8 peak (per chip)	4,614 TFLOPs	~3,958 TFLOPs	~5,000 TFLOPs
HBM	192 GB	141 GB	192 GB
Interconnect	9.6 Tb/s ICI; pod-scale	NVLink (8-GPU NVL)	NVLink 5
Pod / cluster	9,216 chips / 1.77 PB HBM	8 GPU NVLink islands + InfiniBand	72-GPU NVL72 rack + Quantum-X800
Software	JAX (no TF); GKE-only	CUDA (universal)	CUDA
Availability	Google Cloud (rentable)	Broadly available	Shipping, allocated
Strength	Pod-scale inference, HBM, perf/watt	Mature software, ubiquity	Highest per-chip FP8

The honest picture: Ironwood is genuinely competitive at the chip level and advantaged at pod scale. NVIDIA’s moat is software (CUDA), market share, and the fact that most third-party models target it first.

Why this matters for Gemini 4

Gemini 4 has a 10-million-token context window and native multimodal generation (video, audio, spatial). That requires:

Huge HBM per chip to keep activations resident → Ironwood’s 192 GB.
Very high HBM bandwidth for long-context attention → 7.37 TB/s.
Pod-scale fabric so one large request can span thousands of chips → 9,216-chip pods.
Energy efficiency because Google has to serve Gemini 4 to billions of devices economically → ~2x perf-per-watt vs Trillium.

Without Ironwood, Gemini 4 at consumer scale isn’t economic. Ironwood is the silicon that lets Google ship Gemini 4 as the default model on Android, Search AI Mode, and the Gemini app.

How to use Ironwood

Cloud: Ironwood TPUs are part of Google’s AI Hypercomputer architecture on Google Cloud.
Orchestration: TPU7x requires Google Kubernetes Engine (GKE) — no direct VM access.
Frameworks: JAX is the supported framework. TensorFlow is not supported on TPU7x.
Workloads: best for inference at scale; also good for large training jobs that can use JAX.
Cost: competitive with NVIDIA H200 cloud instances for inference workloads at scale (subject to negotiation for large reserves).

Strengths

Pod-scale inference — no NVIDIA cluster gives you 1.77 PB shared HBM today.
HBM per chip — 192 GB at FP8 fits very large dense models.
Perf-per-watt — best in class for production serving.
First-party integration — Gemini 4 + Ironwood + Vertex AI is a coherent stack.

Weaknesses

JAX-only — TensorFlow on TPU7x isn’t supported. PyTorch users have to use PyTorch/XLA (rough edges) or move to NVIDIA.
GKE-only — no direct VM access; you’re committing to Kubernetes.
Vendor lock-in — only via Google Cloud.
Software ecosystem — CUDA still rules third-party model availability.

TL;DR

Ironwood (TPU7x) is the silicon behind Gemini 4 — 4,614 TFLOPs FP8 per chip, 192 GB HBM, 9,216-chip pods sharing 1.77 PB of memory. It’s Google’s strongest hardware response to NVIDIA Blackwell at hyperscale inference, and it’s the reason Gemini 4 can ship 10M-token context to billions of users economically.