What Is Ironwood TPU? Google's 7th-Gen AI Chip (May 2026)
What Is Ironwood TPU? Google’s 7th-Gen AI Chip (May 2026)
Ironwood (TPU7x) is Google’s seventh-generation Tensor Processing Unit — the silicon powering Gemini 4 and Google’s “age of inference” strategy. Here’s what it does, the headline numbers, and how it compares to NVIDIA.
Last verified: May 19, 2026
Quick facts
| Property | Value |
|---|---|
| Vendor | |
| Codename | Ironwood |
| Official name | TPU7x (7th-generation TPU) |
| Unveiled | Google Cloud Next, April 2025 |
| General availability | Weeks after announcement (November 2025) |
| Per-chip FP8 compute | 4,614 TFLOPs |
| Per-chip HBM | 192 GB |
| Per-chip HBM bandwidth | ~7.37 TB/s |
| Pod scale | Up to 9,216 chips |
| Inter-chip Interconnect (ICI) | 9.6 Tb/s |
| Pod-scale shared HBM | 1.77 PB |
| Improvement vs TPU v5p | 10x peak performance |
| Improvement vs Trillium (v6e) | 4x per-chip for training + inference |
| Energy | ~2x perf-per-watt vs Trillium; ~30x vs first Cloud TPU |
| Powers | Gemini 4, Gemini Intelligence, Google Search AI Mode |
What Ironwood is for
Google calls this the age of inference. Earlier TPU generations were optimised for training. Ironwood is purpose-built for high-volume, low-latency inference at hyperscale — serving Gemini 4 to Search, the Gemini app, Android, Vertex AI, and everything else.
Key design choices:
- Inference-first — optimised for serving, not just training.
- Native FP8 — first TPU with FP8 in the Matrix Multiply Units, ~2x BF16 throughput.
- Massive HBM — 192 GB per chip lets large models stay resident without sharding.
- Pod-scale interconnect — 9,216 chips in one pod, sharing 1.77 PB of HBM.
- SparseCore enhancements — specialized accelerators for huge embeddings (recommendations, ranking).
Architecture in plain terms
| Component | What it is |
|---|---|
| Chip | Dual-chiplet — each chiplet has 1 TensorCore, 2 SparseCores, 96 GB HBM |
| TensorCore | Main matrix math; native FP8 MXUs |
| SparseCore | Embedding / sparse workload accelerator |
| HBM | 192 GB per chip @ 7.37 TB/s — large models fit without partitioning |
| ICI | 9.6 Tb/s inter-chip interconnect; 200 GBps per axis (3D torus topology) |
| Pod | 9,216 chips fully interconnected, 1.77 PB shared HBM |
How it compares (May 2026)
| Google Ironwood (TPU7x) | NVIDIA H200 | NVIDIA B200 (Blackwell) | |
|---|---|---|---|
| Vendor | NVIDIA | NVIDIA | |
| FP8 peak (per chip) | 4,614 TFLOPs | ~3,958 TFLOPs | ~5,000 TFLOPs |
| HBM | 192 GB | 141 GB | 192 GB |
| Interconnect | 9.6 Tb/s ICI; pod-scale | NVLink (8-GPU NVL) | NVLink 5 |
| Pod / cluster | 9,216 chips / 1.77 PB HBM | 8 GPU NVLink islands + InfiniBand | 72-GPU NVL72 rack + Quantum-X800 |
| Software | JAX (no TF); GKE-only | CUDA (universal) | CUDA |
| Availability | Google Cloud (rentable) | Broadly available | Shipping, allocated |
| Strength | Pod-scale inference, HBM, perf/watt | Mature software, ubiquity | Highest per-chip FP8 |
The honest picture: Ironwood is genuinely competitive at the chip level and advantaged at pod scale. NVIDIA’s moat is software (CUDA), market share, and the fact that most third-party models target it first.
Why this matters for Gemini 4
Gemini 4 has a 10-million-token context window and native multimodal generation (video, audio, spatial). That requires:
- Huge HBM per chip to keep activations resident → Ironwood’s 192 GB.
- Very high HBM bandwidth for long-context attention → 7.37 TB/s.
- Pod-scale fabric so one large request can span thousands of chips → 9,216-chip pods.
- Energy efficiency because Google has to serve Gemini 4 to billions of devices economically → ~2x perf-per-watt vs Trillium.
Without Ironwood, Gemini 4 at consumer scale isn’t economic. Ironwood is the silicon that lets Google ship Gemini 4 as the default model on Android, Search AI Mode, and the Gemini app.
How to use Ironwood
- Cloud: Ironwood TPUs are part of Google’s AI Hypercomputer architecture on Google Cloud.
- Orchestration: TPU7x requires Google Kubernetes Engine (GKE) — no direct VM access.
- Frameworks: JAX is the supported framework. TensorFlow is not supported on TPU7x.
- Workloads: best for inference at scale; also good for large training jobs that can use JAX.
- Cost: competitive with NVIDIA H200 cloud instances for inference workloads at scale (subject to negotiation for large reserves).
Strengths
- Pod-scale inference — no NVIDIA cluster gives you 1.77 PB shared HBM today.
- HBM per chip — 192 GB at FP8 fits very large dense models.
- Perf-per-watt — best in class for production serving.
- First-party integration — Gemini 4 + Ironwood + Vertex AI is a coherent stack.
Weaknesses
- JAX-only — TensorFlow on TPU7x isn’t supported. PyTorch users have to use PyTorch/XLA (rough edges) or move to NVIDIA.
- GKE-only — no direct VM access; you’re committing to Kubernetes.
- Vendor lock-in — only via Google Cloud.
- Software ecosystem — CUDA still rules third-party model availability.
TL;DR
Ironwood (TPU7x) is the silicon behind Gemini 4 — 4,614 TFLOPs FP8 per chip, 192 GB HBM, 9,216-chip pods sharing 1.77 PB of memory. It’s Google’s strongest hardware response to NVIDIA Blackwell at hyperscale inference, and it’s the reason Gemini 4 can ship 10M-token context to billions of users economically.