What is the OpenAI Jalapeño chip?

Jalapeño is OpenAI's first custom-designed AI accelerator, built in partnership with Broadcom and unveiled on June 24, 2026. It is a large reticle-sized ASIC purpose-built for large language model (LLM) inference — the compute-heavy process of running models like GPT-5.5 and the upcoming GPT-5.6 to serve user queries in ChatGPT, Codex, and the OpenAI API. Unlike Nvidia's general-purpose GPUs, Jalapeño is a 'blank-slate' design optimized end-to-end for the specific kernels, memory movement patterns, networking, and serving requirements of OpenAI's frontier models. The chip went from initial design to manufacturing tape-out in roughly nine months — an unusually fast cycle that OpenAI says was accelerated by using its own models in the chip-design loop. Initial data-center deployment is targeted for late 2026.

How is Jalapeño different from Nvidia Blackwell GPUs?

Three core differences. (1) Purpose: Jalapeño is inference-only, while Nvidia Blackwell handles both training and inference. (2) Specialization: Jalapeño is co-designed with OpenAI's specific model architectures and serving stack, so it can drop optimizations that a general-purpose GPU can't (such as transformer-only paths, OpenAI-specific quantization, and serving patterns that match ChatGPT's load shape). Blackwell stays general-purpose to serve every model lab on Earth. (3) Economics: OpenAI claims 'substantially better' performance per watt, and Bloomberg reported the chip could cut OpenAI's inference costs by roughly 50%. The trade-off is that Jalapeño only helps OpenAI — it doesn't run anyone else's models efficiently, and OpenAI is not selling it as a product. It is internal infrastructure.

When will Jalapeño actually ship?

Initial data-center deployment is slated for late 2026. The chip taped out in roughly nine months from design start, which is fast for a reticle-sized ASIC, but production ramp, software stack maturation, and data-center integration still take time. OpenAI has said a more detailed technical report on Jalapeño's performance characteristics is coming in the next several months. Practically, this means GPT-5.6 and the public ChatGPT product will continue running primarily on Nvidia GPUs (plus Google TPUs and AMD MI-series) through Q3 2026, with Jalapeño-served traffic starting to take a measurable share in Q4 2026 and 2027.

Does Jalapeño mean OpenAI is leaving Nvidia?

No. OpenAI is the largest single customer for AI accelerators globally, with multi-billion-dollar Nvidia GPU commitments, a Google Cloud TPU deal, an AWS Trainium partnership, and the Microsoft Azure / SpaceX Colossus footprint. Jalapeño is additive — it lets OpenAI serve a meaningful share of its own inference workload on silicon it controls economically and architecturally, but it does not replace the Nvidia, TPU, or Trainium capacity. The strategic point is supplier diversification and unit-economics control on a workload (inference) that scales linearly with daily active users. OpenAI is following the same playbook Google did with TPU, Amazon with Trainium/Inferentia, and Meta with MTIA.

Quick Answer

What Is OpenAI's Jalapeño Chip? (June 24, 2026)

Published: June 25, 2026

What Is OpenAI’s Jalapeño Chip? (June 24, 2026)

On Wednesday June 24, 2026, OpenAI and Broadcom unveiled “Jalapeño” — OpenAI’s first custom-built AI inference chip. It is a large reticle-sized ASIC purpose-built for serving large language model (LLM) inference, designed in roughly nine months, and slated for initial data-center deployment in late 2026. This is OpenAI’s answer to Google’s TPU and Amazon’s Trainium — silicon that OpenAI controls economically and architecturally for its own workloads.

Last verified: June 25, 2026.

TL;DR

Announced: June 24, 2026 (joint OpenAI + Broadcom unveiling)
What it is: Custom-designed AI inference ASIC, “blank-slate” purpose-built for LLM inference
Designed by: OpenAI + Broadcom co-engineering; OpenAI used its own models to accelerate parts of the chip-design loop
What it runs: OpenAI’s own models in ChatGPT, Codex, the API, and future agentic workloads
Performance claim: “Substantially better performance per watt” than current state-of-the-art; Bloomberg reported ~50% inference-cost reduction
First deployment: Late 2026 in OpenAI data centers
Not for sale: Internal infrastructure only — not a competitive product against Nvidia GPUs
Strategic point: Supplier diversification and unit-economics control on inference

What Jalapeño is

Jalapeño is OpenAI’s first piece of custom silicon. It is a large reticle-sized ASIC (Application-Specific Integrated Circuit) — meaning it pushes against the physical maximum size that current EUV lithography can produce in a single die, similar in physical footprint to Nvidia’s largest Blackwell parts.

The “blank-slate” framing matters. Nvidia GPUs are general-purpose — they need to run training, inference, scientific computing, image generation, and every model architecture from every lab. Jalapeño does not. It is designed for one customer (OpenAI), one workload type (LLM inference), and one serving stack (OpenAI’s). That lets the design drop a lot of generality and optimize hard for the specific patterns OpenAI sees in production.

What does inference-specific optimization actually mean?

Transformer-first datapaths — Nvidia has to support diffusion models, RNNs, convnets, scientific kernels. Jalapeño doesn’t.
OpenAI-specific quantization — the chip can hard-code the precision schemes OpenAI uses in production rather than carry general FP4/FP8/BF16/FP16/FP32 support.
Serving-shape memory — ChatGPT’s load is bursty and KV-cache-heavy. The memory hierarchy can be tuned to that shape.
Networking topology — the chip-to-chip interconnect can be matched to OpenAI’s specific cluster sizes rather than the generic NVLink topology.

The nine-month design-to-tape-out cycle is notable. Reticle-sized ASICs typically take 18-24 months. OpenAI says it used its own models in the design loop — for RTL generation, verification, and floorplan optimization. This is an under-discussed second-order effect of frontier-model improvement: model labs that own the inference economics can compress hardware design cycles in ways their competitors can’t.

Why OpenAI is doing this

The unit economics of inference are the most important number in OpenAI’s business. Every ChatGPT message, every Codex tool call, every API request costs OpenAI a small amount of compute, and the volume is enormous (and growing). At ~700M weekly active ChatGPT users plus enterprise API + Codex traffic, even small per-query cost improvements compound into hundreds of millions per year.

OpenAI’s strategic problem is that Nvidia captures a large share of the gross margin on every inference dollar. That’s fine when there’s no alternative. It becomes uncomfortable when you can build your own and capture that margin yourself.

The Bloomberg-reported ~50% inference-cost reduction, even if optimistic, is huge. Cutting the dominant variable cost in your business in half changes pricing flexibility, free-tier generosity, and the unit economics of agentic workloads (which by their nature use a lot more inference per user than chat).

The supplier-diversification point is also real. OpenAI has watched what happens to companies that depend on a single critical supplier (Apple/Qualcomm, every car maker/Bosch, every cloud lab/Nvidia). Owning your own inference silicon — even just for a fraction of total workload — gives you negotiating leverage with the suppliers you still depend on.

Jalapeño vs Nvidia Blackwell vs Google Trillium

This is the most-asked question. The short version:

Dimension	Jalapeño	Nvidia Blackwell (B200/B300)	Google TPU Trillium (v6e)
Workload	Inference only	Training + inference	Training + inference
Customers	OpenAI only	Everyone	Google + Google Cloud customers
Design philosophy	LLM-inference-specific ASIC	General-purpose GPU	Tensor-specialized accelerator
Software stack	OpenAI internal	CUDA + TensorRT-LLM (mature, dominant)	JAX/XLA + vLLM (growing)
Availability	Late 2026, internal only	Shipping now	Generally available since Dec 2024
Performance claim	”Substantially better perf/watt” for LLM inference	Best raw per-device compute, broad workload coverage	Best $-per-token at cloud scale
Scale-out	OpenAI cluster-specific	NVLink + InfiniBand	Optical Circuit Switching + Jupiter fabric

For comparison-shopping users (everyone except OpenAI): Jalapeño doesn’t change anything. You’ll still choose between Nvidia GPUs and Google TPUs based on your workload and ecosystem. For OpenAI: Jalapeño potentially shifts a meaningful chunk of its inference economics over the next 18-36 months.

What Jalapeño doesn’t do

Jalapeño is not a product. Broadcom designed and manufactures the chip, but OpenAI consumes 100% of the output internally. There is no announced plan to make it available to other customers, no model-portability story, and no compete-with-Nvidia-on-revenue narrative. The strategic intent is OpenAI’s own cost structure, not market entry.

Jalapeño also does not handle training. Frontier-model training continues to run on Nvidia GPUs (and AMD MI-series, and Google TPUs in the Anthropic/Google partnership). Custom training silicon is much harder — training workloads change with each model architecture, and amortizing the design cost over a single lab’s training runs is much harder than amortizing over years of stable inference traffic.

What changes for the AI infrastructure market

Three follow-on effects to watch:

1. Pressure on Nvidia inference margins

Nvidia’s inference TAM is the largest piece of its addressable market. If OpenAI, Google, Amazon, and Meta all run meaningful shares of their inference on custom silicon, Nvidia’s inference revenue grows slower than its training revenue over the next few years. Nvidia is not going away — training demand is exploding — but the inference mix matters for the stock.

2. Validation of Broadcom’s custom-ASIC business

Broadcom now publicly executes for OpenAI (Jalapeño), Google (TPU), and Meta (MTIA). The Broadcom custom-ASIC story is now the most concrete competitor narrative to Nvidia’s general-purpose hegemony. Expect AMD and Marvell to chase the same workload.

3. The economics of agentic AI shift

Agentic workloads consume far more inference per user than chat. If Jalapeño cuts OpenAI’s per-token inference cost meaningfully, OpenAI can be more aggressive on agentic-pricing experiments — running deeper search, longer tool-call chains, and more parallel rollouts inside ChatGPT and Codex without burning subscription margin.

How to verify the announcement

The primary sources for this story, all published on or after June 24, 2026:

OpenAI’s announcement: openai.com/index/openai-broadcom-jalapeno-inference-chip/
Tom’s Hardware: Technical writeup describing reticle-sized ASIC, nine-month design cycle
TechCrunch: First-custom-chip framing, manufacturing partnership with Broadcom
CNBC: Inference-only positioning, ChatGPT/Codex/API deployment context
VentureBeat: Detail on OpenAI’s models accelerating chip design

Bottom line

Jalapeño is a strategic move, not a product. It validates that the largest AI labs all eventually move toward custom inference silicon for the same reason hyperscalers did a decade ago: unit economics on scaled-out, predictable workloads dominate everything else. The chip ships in late 2026, OpenAI keeps it internal, and the most visible second-order effect is competitive pressure on Nvidia’s inference revenue mix over the next 18-36 months.

Watch for the technical report OpenAI promised “in the coming months” — that’s where the real numbers (perf/watt, cost-per-token, throughput) will appear, and where the analysis can move past announcement-grade claims.