AI agents · OpenClaw · self-hosting · automation

Quick Answer

How to Run Kimi K2.5 on Cloudflare Workers AI

Published:

How to Run Kimi K2.5 on Cloudflare Workers AI

Moonshot AI’s Kimi K2.5 is now available on Cloudflare Workers AI, letting you run a frontier-level open-source model on edge infrastructure without managing GPU servers. This guide walks you through setup.

Last verified: March 2026

Prerequisites

  • Cloudflare account with Workers AI enabled
  • Node.js 18+ or Wrangler CLI installed
  • Basic familiarity with Cloudflare Workers

Step 1: Set Up Your Worker Project

npm create cloudflare@latest kimi-k25-worker
cd kimi-k25-worker

Select “Hello World” template when prompted.

Step 2: Configure wrangler.toml

name = "kimi-k25-worker"
main = "src/index.ts"
compatibility_date = "2026-03-01"

[ai]
binding = "AI"

Step 3: Write the Worker

// src/index.ts
export interface Env {
  AI: Ai;
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const { prompt } = await request.json() as { prompt: string };

    const response = await env.AI.run(
      "@moonshot-ai/kimi-k2.5",
      {
        messages: [
          { role: "system", content: "You are a helpful assistant." },
          { role: "user", content: prompt }
        ],
        max_tokens: 2048,
      }
    );

    return Response.json(response);
  },
};

Step 4: Deploy

npx wrangler deploy

Your Kimi K2.5 endpoint is now live on Cloudflare’s global edge network.

Using Vision Capabilities

Kimi K2.5 includes the MoonViT vision encoder. To use it on Workers AI:

const response = await env.AI.run(
  "@moonshot-ai/kimi-k2.5",
  {
    messages: [
      {
        role: "user",
        content: [
          { type: "text", text: "What's in this image?" },
          { type: "image_url", image_url: { url: imageBase64 } }
        ]
      }
    ],
  }
);

Performance Characteristics

MetricCloudflare Workers AISelf-Hosted (4x A100)
Latency (TTFT)~200ms (edge)~150ms (direct)
ThroughputAuto-scaledFixed by hardware
Setup time5 minutesHours to days
Cost at 1M tokens/day~$40/day~$100/day (hardware)
MaintenanceZeroSignificant

Limitations

  • No Agent Swarm — Standard inference only; Agent Swarm requires self-hosted orchestration
  • Context window — Workers AI may limit context below the full 256K tokens
  • Rate limits — Subject to Cloudflare Workers AI rate limits on your plan
  • Cold starts — First request may have higher latency

When to Use This vs. Alternatives

Use caseBest option
Quick API prototypingCloudflare Workers AI ✅
Production at scaleMoonshot AI API or self-hosted
Agent Swarm workflowsSelf-hosted or Moonshot API
Edge/low-latency needsCloudflare Workers AI ✅
Full 256K contextSelf-hosted

Cloudflare Workers AI is the fastest way to get Kimi K2.5 running without infrastructure overhead.

Last verified: March 2026