Quick Answer
How to Run Kimi K2.5 on Cloudflare Workers AI
How to Run Kimi K2.5 on Cloudflare Workers AI
Moonshot AI’s Kimi K2.5 is now available on Cloudflare Workers AI, letting you run a frontier-level open-source model on edge infrastructure without managing GPU servers. This guide walks you through setup.
Last verified: March 2026
Prerequisites
- Cloudflare account with Workers AI enabled
- Node.js 18+ or Wrangler CLI installed
- Basic familiarity with Cloudflare Workers
Step 1: Set Up Your Worker Project
npm create cloudflare@latest kimi-k25-worker
cd kimi-k25-worker
Select “Hello World” template when prompted.
Step 2: Configure wrangler.toml
name = "kimi-k25-worker"
main = "src/index.ts"
compatibility_date = "2026-03-01"
[ai]
binding = "AI"
Step 3: Write the Worker
// src/index.ts
export interface Env {
AI: Ai;
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const { prompt } = await request.json() as { prompt: string };
const response = await env.AI.run(
"@moonshot-ai/kimi-k2.5",
{
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: prompt }
],
max_tokens: 2048,
}
);
return Response.json(response);
},
};
Step 4: Deploy
npx wrangler deploy
Your Kimi K2.5 endpoint is now live on Cloudflare’s global edge network.
Using Vision Capabilities
Kimi K2.5 includes the MoonViT vision encoder. To use it on Workers AI:
const response = await env.AI.run(
"@moonshot-ai/kimi-k2.5",
{
messages: [
{
role: "user",
content: [
{ type: "text", text: "What's in this image?" },
{ type: "image_url", image_url: { url: imageBase64 } }
]
}
],
}
);
Performance Characteristics
| Metric | Cloudflare Workers AI | Self-Hosted (4x A100) |
|---|---|---|
| Latency (TTFT) | ~200ms (edge) | ~150ms (direct) |
| Throughput | Auto-scaled | Fixed by hardware |
| Setup time | 5 minutes | Hours to days |
| Cost at 1M tokens/day | ~$40/day | ~$100/day (hardware) |
| Maintenance | Zero | Significant |
Limitations
- No Agent Swarm — Standard inference only; Agent Swarm requires self-hosted orchestration
- Context window — Workers AI may limit context below the full 256K tokens
- Rate limits — Subject to Cloudflare Workers AI rate limits on your plan
- Cold starts — First request may have higher latency
When to Use This vs. Alternatives
| Use case | Best option |
|---|---|
| Quick API prototyping | Cloudflare Workers AI ✅ |
| Production at scale | Moonshot AI API or self-hosted |
| Agent Swarm workflows | Self-hosted or Moonshot API |
| Edge/low-latency needs | Cloudflare Workers AI ✅ |
| Full 256K context | Self-hosted |
Cloudflare Workers AI is the fastest way to get Kimi K2.5 running without infrastructure overhead.
Last verified: March 2026