Quick Answer

How to Run Kimi K2.5 on Cloudflare Workers AI

Published: March 24, 2026

How to Run Kimi K2.5 on Cloudflare Workers AI

Moonshot AI’s Kimi K2.5 is now available on Cloudflare Workers AI, letting you run a frontier-level open-source model on edge infrastructure without managing GPU servers. This guide walks you through setup.

Last verified: March 2026

Prerequisites

Cloudflare account with Workers AI enabled
Node.js 18+ or Wrangler CLI installed
Basic familiarity with Cloudflare Workers

Step 1: Set Up Your Worker Project

npm create cloudflare@latest kimi-k25-worker
cd kimi-k25-worker

Select “Hello World” template when prompted.

Step 2: Configure wrangler.toml

name = "kimi-k25-worker"
main = "src/index.ts"
compatibility_date = "2026-03-01"

[ai]
binding = "AI"

Step 3: Write the Worker

// src/index.ts
export interface Env {
  AI: Ai;
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const { prompt } = await request.json() as { prompt: string };

    const response = await env.AI.run(
      "@moonshot-ai/kimi-k2.5",
      {
        messages: [
          { role: "system", content: "You are a helpful assistant." },
          { role: "user", content: prompt }
        ],
        max_tokens: 2048,
      }
    );

    return Response.json(response);
  },
};

Step 4: Deploy

npx wrangler deploy

Your Kimi K2.5 endpoint is now live on Cloudflare’s global edge network.

Using Vision Capabilities

Kimi K2.5 includes the MoonViT vision encoder. To use it on Workers AI:

const response = await env.AI.run(
  "@moonshot-ai/kimi-k2.5",
  {
    messages: [
      {
        role: "user",
        content: [
          { type: "text", text: "What's in this image?" },
          { type: "image_url", image_url: { url: imageBase64 } }
        ]
      }
    ],
  }
);

Performance Characteristics

Metric	Cloudflare Workers AI	Self-Hosted (4x A100)
Latency (TTFT)	~200ms (edge)	~150ms (direct)
Throughput	Auto-scaled	Fixed by hardware
Setup time	5 minutes	Hours to days
Cost at 1M tokens/day	~$40/day	~$100/day (hardware)
Maintenance	Zero	Significant

Limitations

No Agent Swarm — Standard inference only; Agent Swarm requires self-hosted orchestration
Context window — Workers AI may limit context below the full 256K tokens
Rate limits — Subject to Cloudflare Workers AI rate limits on your plan
Cold starts — First request may have higher latency

When to Use This vs. Alternatives

Use case	Best option
Quick API prototyping	Cloudflare Workers AI ✅
Production at scale	Moonshot AI API or self-hosted
Agent Swarm workflows	Self-hosted or Moonshot API
Edge/low-latency needs	Cloudflare Workers AI ✅
Full 256K context	Self-hosted

Cloudflare Workers AI is the fastest way to get Kimi K2.5 running without infrastructure overhead.

Last verified: March 2026