What is Baseten and what does it do?

Baseten is an AI inference platform that lets companies customize, deploy, and operate their own AI models on managed infrastructure. The product handles model serving (running models at production scale), autoscaling (handling traffic spikes), routing (sending requests to the right model variant), observability (logs, metrics, traces), and the operational layer around inference (CI/CD for models, A/B testing, canary deployments). Baseten currently processes over 1 billion inference calls per day across 87 global clusters. The platform is positioned for teams that want to run their own models — open-weight LLMs, custom fine-tunes, multimodal models, image generation, audio — without building the inference infrastructure stack themselves. Customers range from AI-native startups to Fortune 500 enterprises deploying internal AI applications.

Why did Baseten raise $1.5 billion at a $13 billion valuation?

Three reasons converged in mid-2026. (1) AI inference is the largest and fastest-growing infrastructure category in software — model training spend (the dominant 2022-2024 line) is being surpassed by inference spend as models hit production at scale. (2) Baseten's metrics are strong: 1B+ daily inference calls, 87 global clusters, and reported customer retention and expansion well above typical infrastructure benchmarks. (3) The competitive set (Together AI, Fireworks AI, Anyscale, Modal, Replicate, plus the hyperscaler offerings from AWS, GCP, Azure) is crowded but no clear winner has emerged in the 'multi-model inference for everyone, including open weights' segment. Investors are paying $13B because they believe Baseten is the most likely to consolidate that segment. The June 2026 Series F was led by Altimeter Capital, Conviction, and Spark Capital with co-leads Sands Capital and Wellington Management.

How does Baseten compare to Together AI, Fireworks AI, and Modal?

All four target AI inference but with different emphases. Baseten leads on the 'bring your own model + we operate it' workflow with enterprise-grade tooling. Together AI is strongest on hosted open-weight models with a public API (you call their endpoint, no deployment required) and competitive batch inference pricing. Fireworks AI focuses on extremely fast inference with FP8/FP4 quantization for the open-weight models people actually use in production. Modal targets serverless compute including but not limited to inference — strong developer experience, less inference-specific. Choice depends on the workload: managed deployment of custom models → Baseten; API-style access to open-weight LLMs → Together or Fireworks; serverless compute with inference as one use case → Modal. Hyperscalers (AWS Bedrock, GCP Vertex, Azure Foundry) compete on the proprietary-models-plus-some-open-weight axis with deeper cloud integration.

Should I use Baseten or self-host my AI models?

Self-hosting is the right answer when you have predictable high-volume workloads, deep ML-ops capability in-house, and either compliance constraints requiring on-prem or unit economics that make managed inference uneconomic at your scale. Baseten (or similar managed inference platforms) is the right answer when: you don't want to build and operate inference infrastructure, your workload is bursty or unpredictable, you need global low-latency routing, you want enterprise-grade observability and security without building it, or you're moving fast and managed inference is good enough until you're forced to vertically integrate. For most teams in mid-2026, managed inference (Baseten, Together, Fireworks, Modal, or hyperscaler equivalents) is the right starting point. The break-even point for vertical integration is generally $50K-$200K/month in inference spend, depending on workload shape and team capability.

Quick Answer

What Is Baseten? $13B Valuation, Series F Explained (June 2026)

Published: June 27, 2026

What Is Baseten? $13B Valuation, Series F Explained (June 2026)

On June 25, 2026, AI inference platform Baseten announced a $1.5 billion Series F at a valuation of up to $13 billion. The round was led by Altimeter Capital, Conviction, and Spark Capital with co-leads Sands Capital and Wellington Management — putting Baseten in the elite tier of AI infrastructure companies alongside Together AI, Fireworks AI, and the well-funded inference-stack players. This page explains what Baseten does, why investors paid $13B, how it compares to alternatives, and how to decide whether to use it.

Last verified: June 27, 2026.

TL;DR

Baseten: AI inference platform — lets you deploy, serve, and operate your own AI models on managed infrastructure
Series F: $1.5B raised June 2026 at up to $13B valuation
Scale: 1B+ daily inference calls across 87 global clusters
Lead investors: Altimeter Capital, Conviction, Spark Capital (co-leads: Sands Capital, Wellington Management)
Use it for: managed deployment of custom or open-weight models with enterprise-grade tooling
Alternatives: Together AI (API for open-weight LLMs), Fireworks AI (fast quantized inference), Modal (serverless compute), AWS Bedrock / GCP Vertex / Azure Foundry (hyperscaler)
Self-host instead when: predictable high-volume + ML-ops capability + compliance or unit economics demand

What Baseten actually does

Baseten is the operational layer between “I have a model” and “this model serves traffic in production with all the production stuff.” The product handles:

Model serving. Run any model — open-weight LLM (Llama 4, Qwen 3.5, DeepSeek V3.5), custom fine-tune, multimodal model, image generation, audio model, embedding model — on managed GPU infrastructure. Baseten handles container builds, GPU allocation, scheduler integration, and the inference runtime.

Autoscaling. Scale up for traffic spikes, scale down to zero when idle, with cold-start optimization that keeps user-perceived latency manageable. This is the hard problem of inference economics — most workloads don’t have predictable steady-state demand.

Routing. Send requests to the right model variant based on input characteristics, A/B test configurations, canary deployments, or geographic location. The 87 global clusters figure is the geographic-routing infrastructure.

Observability. Logs, metrics, distributed traces, prompt-level visibility, cost attribution per request — the operational visibility that’s hard to build in-house.

Operational tooling. CI/CD for models (deploy from git), A/B testing of model variants, traffic-shifted canary releases, rollback, model registry.

Enterprise features. SSO, audit logs, VPC peering, on-prem deployment, compliance certifications (SOC 2, HIPAA, etc.).

Why $13 billion

Three converging dynamics justify the valuation in investor analysis:

1. Inference is now the largest AI infrastructure category. Through 2023-2024, model training dominated AI infrastructure spend. As models hit production at enterprise scale through 2025-2026, inference spend has overtaken training. Industry estimates put global AI inference spend at >$100B annual run-rate by end of 2026, growing 100%+ year-over-year. Baseten is positioned in the fastest-growing infrastructure category in software.

2. Strong customer metrics. 1B+ daily inference calls across 87 global clusters is a meaningful production-scale number. Reported (in coverage of the round) high net dollar retention and rapid customer expansion. The customer base spans AI-native startups (their original wedge) and Fortune 500 enterprises (their expansion).

3. No clear winner in the multi-model managed inference segment. Baseten competes with Together AI, Fireworks AI, Modal, Replicate, Anyscale, and the hyperscaler inference products (AWS Bedrock, GCP Vertex, Azure Foundry). None of these has consolidated the “you bring or pick a model, we operate it for you” segment with strong enterprise tooling. Investors are betting that segment will consolidate and that Baseten is the most likely consolidator.

The $13B valuation implies investors are paying ~13x forward revenue at a roughly $1B annual run-rate assumption, which is rich but consistent with other 2026 AI infrastructure rounds (Together’s previous round, Anyscale’s previous round) and with the broader AI infrastructure premium.

How Baseten compares to alternatives

Platform	Best for	Pricing model	Notable
Baseten	Managed deployment of custom or open-weight models with enterprise tooling	Per-GPU-hour + per-cluster	1B+ daily inferences, 87 clusters
Together AI	API access to hosted open-weight LLMs; batch inference	Per-token (API)	Public model catalog, competitive batch pricing
Fireworks AI	Extreme low-latency inference; FP8/FP4 quantization	Per-token (API)	Performance-optimized for open-weight LLMs
Modal	Serverless compute including inference	Per-second compute	Strong developer experience, broader than inference
Replicate	Hobbyist + indie deployments of community models	Per-second compute	Public model catalog, simple API
Anyscale	Ray-based distributed inference at scale	Cluster-based	Strong for ML engineers who want Ray
AWS Bedrock / GCP Vertex / Azure Foundry	Cloud-integrated inference for proprietary + some open-weight	Per-token + infrastructure	Deep hyperscaler integration

When to use Baseten

Baseten is the right choice when:

You need managed deployment of custom models or fine-tunes (Baseten’s wedge)
You want enterprise-grade tooling without building it (observability, RBAC, SSO, audit, compliance)
You have bursty or unpredictable traffic and need autoscaling done well
You operate globally and need low-latency routing across 87 clusters
Your team is small enough that ML-ops in-house isn’t the priority
You’re running multiple models in production and want a single operational pane

Baseten is less suited when:

You just want an API endpoint for a popular open-weight LLM — Together AI or Fireworks AI are more direct
You want serverless compute including non-inference workloads — Modal is broader
You’re at very large scale (>$5M/month inference spend) where vertical integration unit economics dominate
You’re a hobbyist or indie developer where Replicate’s simpler model fits better
You’re heavily cloud-integrated and want hyperscaler-native inference

When to self-host instead

Self-hosting is the right answer when:

Predictable, high-volume workloads. If you’re running 100M+ inferences/day on a stable model with stable demand, you can build it cheaper than you can rent it.
Deep ML-ops capability in-house. If you have the team to operate inference infrastructure properly, the savings can be material.
Compliance constraints. Some regulated workloads can’t run on multi-tenant managed infrastructure even with VPC peering. On-prem or dedicated single-tenant is required.
Unit economics demand vertical integration. Above roughly $200K/month in managed inference spend on a single model, vertical integration starts to pay back the engineering investment.

The break-even point between managed inference and self-hosted moves over time as both managed platforms get cheaper (competition + scale) and self-host tooling gets better (vLLM, TensorRT-LLM, SGLang, Triton). In mid-2026, most teams should start managed and graduate to self-host only when forced by economics or compliance.

What the round signals about the inference market

The Baseten Series F is the largest AI inference round of 2026 so far and one of the largest AI infrastructure rounds of any kind. It signals:

Inference infrastructure is in the consolidation phase. Investors are picking winners, not spreading bets.
The “bring your own model” segment is the contested frontier. API-first companies (Together, Fireworks) and hyperscaler integration both have momentum, but managed-deployment-of-custom-models is the higher-margin opportunity.
Open-weight model adoption is real and durable. Investors wouldn’t pay $13B for managed inference if proprietary closed-API models (GPT, Claude, Gemini) were going to capture all production AI workloads.
Expect M&A. At $13B, Baseten is now in acquisition-target range for the largest software companies (Salesforce, ServiceNow, Snowflake, Databricks) and a plausible bidder for smaller inference players.
The Q2 2026 funding environment for AI infrastructure remains strong. Baseten’s round is one of several large AI infrastructure rounds in June 2026 (Groq $650M, AppsFlyer $1B+, General Intuition $320M).

What Is Baseten? $13B Valuation, Series F Explained (June 2026)

TL;DR

What Baseten actually does

Why $13 billion

How Baseten compares to alternatives

When to use Baseten

When to self-host instead

What the round signals about the inference market

Related