AI agents · OpenClaw · self-hosting · automation

Quick Answer

What Is Ironwood TPU? Google's 7th-Gen AI Chip (May 2026)

Published:

What Is Ironwood TPU? Google’s 7th-Gen AI Chip (May 2026)

Ironwood (TPU7x) is Google’s seventh-generation Tensor Processing Unit — the silicon powering Gemini 4 and Google’s “age of inference” strategy. Here’s what it does, the headline numbers, and how it compares to NVIDIA.

Last verified: May 19, 2026

Quick facts

PropertyValue
VendorGoogle
CodenameIronwood
Official nameTPU7x (7th-generation TPU)
UnveiledGoogle Cloud Next, April 2025
General availabilityWeeks after announcement (November 2025)
Per-chip FP8 compute4,614 TFLOPs
Per-chip HBM192 GB
Per-chip HBM bandwidth~7.37 TB/s
Pod scaleUp to 9,216 chips
Inter-chip Interconnect (ICI)9.6 Tb/s
Pod-scale shared HBM1.77 PB
Improvement vs TPU v5p10x peak performance
Improvement vs Trillium (v6e)4x per-chip for training + inference
Energy~2x perf-per-watt vs Trillium; ~30x vs first Cloud TPU
PowersGemini 4, Gemini Intelligence, Google Search AI Mode

What Ironwood is for

Google calls this the age of inference. Earlier TPU generations were optimised for training. Ironwood is purpose-built for high-volume, low-latency inference at hyperscale — serving Gemini 4 to Search, the Gemini app, Android, Vertex AI, and everything else.

Key design choices:

  • Inference-first — optimised for serving, not just training.
  • Native FP8 — first TPU with FP8 in the Matrix Multiply Units, ~2x BF16 throughput.
  • Massive HBM — 192 GB per chip lets large models stay resident without sharding.
  • Pod-scale interconnect — 9,216 chips in one pod, sharing 1.77 PB of HBM.
  • SparseCore enhancements — specialized accelerators for huge embeddings (recommendations, ranking).

Architecture in plain terms

ComponentWhat it is
ChipDual-chiplet — each chiplet has 1 TensorCore, 2 SparseCores, 96 GB HBM
TensorCoreMain matrix math; native FP8 MXUs
SparseCoreEmbedding / sparse workload accelerator
HBM192 GB per chip @ 7.37 TB/s — large models fit without partitioning
ICI9.6 Tb/s inter-chip interconnect; 200 GBps per axis (3D torus topology)
Pod9,216 chips fully interconnected, 1.77 PB shared HBM

How it compares (May 2026)

Google Ironwood (TPU7x)NVIDIA H200NVIDIA B200 (Blackwell)
VendorGoogleNVIDIANVIDIA
FP8 peak (per chip)4,614 TFLOPs~3,958 TFLOPs~5,000 TFLOPs
HBM192 GB141 GB192 GB
Interconnect9.6 Tb/s ICI; pod-scaleNVLink (8-GPU NVL)NVLink 5
Pod / cluster9,216 chips / 1.77 PB HBM8 GPU NVLink islands + InfiniBand72-GPU NVL72 rack + Quantum-X800
SoftwareJAX (no TF); GKE-onlyCUDA (universal)CUDA
AvailabilityGoogle Cloud (rentable)Broadly availableShipping, allocated
StrengthPod-scale inference, HBM, perf/wattMature software, ubiquityHighest per-chip FP8

The honest picture: Ironwood is genuinely competitive at the chip level and advantaged at pod scale. NVIDIA’s moat is software (CUDA), market share, and the fact that most third-party models target it first.

Why this matters for Gemini 4

Gemini 4 has a 10-million-token context window and native multimodal generation (video, audio, spatial). That requires:

  • Huge HBM per chip to keep activations resident → Ironwood’s 192 GB.
  • Very high HBM bandwidth for long-context attention → 7.37 TB/s.
  • Pod-scale fabric so one large request can span thousands of chips → 9,216-chip pods.
  • Energy efficiency because Google has to serve Gemini 4 to billions of devices economically → ~2x perf-per-watt vs Trillium.

Without Ironwood, Gemini 4 at consumer scale isn’t economic. Ironwood is the silicon that lets Google ship Gemini 4 as the default model on Android, Search AI Mode, and the Gemini app.

How to use Ironwood

  • Cloud: Ironwood TPUs are part of Google’s AI Hypercomputer architecture on Google Cloud.
  • Orchestration: TPU7x requires Google Kubernetes Engine (GKE) — no direct VM access.
  • Frameworks: JAX is the supported framework. TensorFlow is not supported on TPU7x.
  • Workloads: best for inference at scale; also good for large training jobs that can use JAX.
  • Cost: competitive with NVIDIA H200 cloud instances for inference workloads at scale (subject to negotiation for large reserves).

Strengths

  • Pod-scale inference — no NVIDIA cluster gives you 1.77 PB shared HBM today.
  • HBM per chip — 192 GB at FP8 fits very large dense models.
  • Perf-per-watt — best in class for production serving.
  • First-party integration — Gemini 4 + Ironwood + Vertex AI is a coherent stack.

Weaknesses

  • JAX-only — TensorFlow on TPU7x isn’t supported. PyTorch users have to use PyTorch/XLA (rough edges) or move to NVIDIA.
  • GKE-only — no direct VM access; you’re committing to Kubernetes.
  • Vendor lock-in — only via Google Cloud.
  • Software ecosystem — CUDA still rules third-party model availability.

TL;DR

Ironwood (TPU7x) is the silicon behind Gemini 4 — 4,614 TFLOPs FP8 per chip, 192 GB HBM, 9,216-chip pods sharing 1.77 PB of memory. It’s Google’s strongest hardware response to NVIDIA Blackwell at hyperscale inference, and it’s the reason Gemini 4 can ship 10M-token context to billions of users economically.