What Is Baseten? $13B Valuation, Series F Explained (June 2026)
What Is Baseten? $13B Valuation, Series F Explained (June 2026)
On June 25, 2026, AI inference platform Baseten announced a $1.5 billion Series F at a valuation of up to $13 billion. The round was led by Altimeter Capital, Conviction, and Spark Capital with co-leads Sands Capital and Wellington Management — putting Baseten in the elite tier of AI infrastructure companies alongside Together AI, Fireworks AI, and the well-funded inference-stack players. This page explains what Baseten does, why investors paid $13B, how it compares to alternatives, and how to decide whether to use it.
Last verified: June 27, 2026.
TL;DR
- Baseten: AI inference platform — lets you deploy, serve, and operate your own AI models on managed infrastructure
- Series F: $1.5B raised June 2026 at up to $13B valuation
- Scale: 1B+ daily inference calls across 87 global clusters
- Lead investors: Altimeter Capital, Conviction, Spark Capital (co-leads: Sands Capital, Wellington Management)
- Use it for: managed deployment of custom or open-weight models with enterprise-grade tooling
- Alternatives: Together AI (API for open-weight LLMs), Fireworks AI (fast quantized inference), Modal (serverless compute), AWS Bedrock / GCP Vertex / Azure Foundry (hyperscaler)
- Self-host instead when: predictable high-volume + ML-ops capability + compliance or unit economics demand
What Baseten actually does
Baseten is the operational layer between “I have a model” and “this model serves traffic in production with all the production stuff.” The product handles:
Model serving. Run any model — open-weight LLM (Llama 4, Qwen 3.5, DeepSeek V3.5), custom fine-tune, multimodal model, image generation, audio model, embedding model — on managed GPU infrastructure. Baseten handles container builds, GPU allocation, scheduler integration, and the inference runtime.
Autoscaling. Scale up for traffic spikes, scale down to zero when idle, with cold-start optimization that keeps user-perceived latency manageable. This is the hard problem of inference economics — most workloads don’t have predictable steady-state demand.
Routing. Send requests to the right model variant based on input characteristics, A/B test configurations, canary deployments, or geographic location. The 87 global clusters figure is the geographic-routing infrastructure.
Observability. Logs, metrics, distributed traces, prompt-level visibility, cost attribution per request — the operational visibility that’s hard to build in-house.
Operational tooling. CI/CD for models (deploy from git), A/B testing of model variants, traffic-shifted canary releases, rollback, model registry.
Enterprise features. SSO, audit logs, VPC peering, on-prem deployment, compliance certifications (SOC 2, HIPAA, etc.).
Why $13 billion
Three converging dynamics justify the valuation in investor analysis:
1. Inference is now the largest AI infrastructure category. Through 2023-2024, model training dominated AI infrastructure spend. As models hit production at enterprise scale through 2025-2026, inference spend has overtaken training. Industry estimates put global AI inference spend at >$100B annual run-rate by end of 2026, growing 100%+ year-over-year. Baseten is positioned in the fastest-growing infrastructure category in software.
2. Strong customer metrics. 1B+ daily inference calls across 87 global clusters is a meaningful production-scale number. Reported (in coverage of the round) high net dollar retention and rapid customer expansion. The customer base spans AI-native startups (their original wedge) and Fortune 500 enterprises (their expansion).
3. No clear winner in the multi-model managed inference segment. Baseten competes with Together AI, Fireworks AI, Modal, Replicate, Anyscale, and the hyperscaler inference products (AWS Bedrock, GCP Vertex, Azure Foundry). None of these has consolidated the “you bring or pick a model, we operate it for you” segment with strong enterprise tooling. Investors are betting that segment will consolidate and that Baseten is the most likely consolidator.
The $13B valuation implies investors are paying ~13x forward revenue at a roughly $1B annual run-rate assumption, which is rich but consistent with other 2026 AI infrastructure rounds (Together’s previous round, Anyscale’s previous round) and with the broader AI infrastructure premium.
How Baseten compares to alternatives
| Platform | Best for | Pricing model | Notable |
|---|---|---|---|
| Baseten | Managed deployment of custom or open-weight models with enterprise tooling | Per-GPU-hour + per-cluster | 1B+ daily inferences, 87 clusters |
| Together AI | API access to hosted open-weight LLMs; batch inference | Per-token (API) | Public model catalog, competitive batch pricing |
| Fireworks AI | Extreme low-latency inference; FP8/FP4 quantization | Per-token (API) | Performance-optimized for open-weight LLMs |
| Modal | Serverless compute including inference | Per-second compute | Strong developer experience, broader than inference |
| Replicate | Hobbyist + indie deployments of community models | Per-second compute | Public model catalog, simple API |
| Anyscale | Ray-based distributed inference at scale | Cluster-based | Strong for ML engineers who want Ray |
| AWS Bedrock / GCP Vertex / Azure Foundry | Cloud-integrated inference for proprietary + some open-weight | Per-token + infrastructure | Deep hyperscaler integration |
When to use Baseten
Baseten is the right choice when:
- You need managed deployment of custom models or fine-tunes (Baseten’s wedge)
- You want enterprise-grade tooling without building it (observability, RBAC, SSO, audit, compliance)
- You have bursty or unpredictable traffic and need autoscaling done well
- You operate globally and need low-latency routing across 87 clusters
- Your team is small enough that ML-ops in-house isn’t the priority
- You’re running multiple models in production and want a single operational pane
Baseten is less suited when:
- You just want an API endpoint for a popular open-weight LLM — Together AI or Fireworks AI are more direct
- You want serverless compute including non-inference workloads — Modal is broader
- You’re at very large scale (>$5M/month inference spend) where vertical integration unit economics dominate
- You’re a hobbyist or indie developer where Replicate’s simpler model fits better
- You’re heavily cloud-integrated and want hyperscaler-native inference
When to self-host instead
Self-hosting is the right answer when:
- Predictable, high-volume workloads. If you’re running 100M+ inferences/day on a stable model with stable demand, you can build it cheaper than you can rent it.
- Deep ML-ops capability in-house. If you have the team to operate inference infrastructure properly, the savings can be material.
- Compliance constraints. Some regulated workloads can’t run on multi-tenant managed infrastructure even with VPC peering. On-prem or dedicated single-tenant is required.
- Unit economics demand vertical integration. Above roughly $200K/month in managed inference spend on a single model, vertical integration starts to pay back the engineering investment.
The break-even point between managed inference and self-hosted moves over time as both managed platforms get cheaper (competition + scale) and self-host tooling gets better (vLLM, TensorRT-LLM, SGLang, Triton). In mid-2026, most teams should start managed and graduate to self-host only when forced by economics or compliance.
What the round signals about the inference market
The Baseten Series F is the largest AI inference round of 2026 so far and one of the largest AI infrastructure rounds of any kind. It signals:
- Inference infrastructure is in the consolidation phase. Investors are picking winners, not spreading bets.
- The “bring your own model” segment is the contested frontier. API-first companies (Together, Fireworks) and hyperscaler integration both have momentum, but managed-deployment-of-custom-models is the higher-margin opportunity.
- Open-weight model adoption is real and durable. Investors wouldn’t pay $13B for managed inference if proprietary closed-API models (GPT, Claude, Gemini) were going to capture all production AI workloads.
- Expect M&A. At $13B, Baseten is now in acquisition-target range for the largest software companies (Salesforce, ServiceNow, Snowflake, Databricks) and a plausible bidder for smaller inference players.
- The Q2 2026 funding environment for AI infrastructure remains strong. Baseten’s round is one of several large AI infrastructure rounds in June 2026 (Groq $650M, AppsFlyer $1B+, General Intuition $320M).