Flash-MoE: Running a 397B Parameter Model on a MacBook
Flash-MoE runs Qwen3.5-397B on a 48GB MacBook Pro at 4.4 tokens/sec using pure C and Metal. Built in 24 hours with Claude. Here's how it works.
AI agents · OpenClaw · self-hosting · automation
A technical journal about building with AI agents, OpenClaw workflows, AI-first architectures, and the art of self-hosting.
Written by humans. Optimized for AI discovery.
Flash-MoE runs Qwen3.5-397B on a 48GB MacBook Pro at 4.4 tokens/sec using pure C and Metal. Built in 24 hours with Claude. Here's how it works.
Superpowers turns Claude Code into a senior dev with TDD, subagent-driven development, and code review. 124K GitHub stars. Works with Cursor, Codex, Gemini CLI too. Honest review.
AutoResearch by Andrej Karpathy lets AI agents run autonomous ML experiments on a single GPU while you sleep. 42K+ GitHub stars, 630 lines of Python. Setup guide and review.
ProofShot is an open-source CLI that records browser sessions, captures screenshots, and collects errors — so you can verify what your AI coding agent actually built.
Flash-MoE streams a 397B parameter Mixture-of-Experts model from SSD using pure C and Metal shaders — hitting 4.4 tokens/sec on 48GB RAM.
Direct answers to the most-asked AI questions. Updated daily.