TL;DR
LiteRT-LM is Google’s open-source framework for running LLMs on edge devices — phones, tablets, browsers, wearables, and IoT. Key facts:
- Powers production Google products: Chrome, Chromebook Plus, Pixel Watch
- Just added Gemma 4 support — Google’s most capable on-device model (Apache 2.0)
- Cross-platform: Android, iOS (coming), Web, Desktop, Raspberry Pi
- Hardware acceleration: GPU and NPU via platform-specific backends
- Multi-modal: Vision and audio inputs, not just text
- Function calling: Tool use support for agentic workflows on-device
- Models: Gemma 4, Gemma 3n, Llama, Phi-4, Qwen, and more
- 3,157 GitHub stars | Apache 2.0 | C++ core with Kotlin, Python, C++ APIs
- One command to try:
uv tool install litert-lm && litert-lm run --from-huggingface-repo=...
This isn’t a research project — it’s what actually runs AI in Google’s shipping products.
Why LiteRT-LM Matters
The AI industry has a cloud problem. Every API call to GPT, Claude, or Gemini costs money, adds latency, and sends user data to external servers. LiteRT-LM is Google’s answer: run the model directly on the user’s device.
What makes it different from Ollama or llama.cpp:
| Feature | LiteRT-LM | Ollama | llama.cpp |
|---|---|---|---|
| Target | Mobile/edge/IoT | Desktop/server | Desktop/server |
| Platforms | Android, iOS, Web, Pi | macOS, Linux, Windows | macOS, Linux, Windows |
| Optimization | GPU + NPU acceleration | CPU + GPU | CPU + Metal/CUDA |
| Production | Powers Chrome, Pixel Watch | Developer tool | Developer tool |
| Model format | .litertlm (optimized) | GGUF | GGUF |
| Function calling | Built-in | No | No |
| Multi-modal | Vision + audio | Text only | Text + vision |
The key differentiator: LiteRT-LM is specifically optimized for constrained devices. It memory-maps embedding layers (keeping them on disk until needed), uses NPU acceleration where available, and is designed for the 2-8GB RAM reality of phones and wearables.
What’s New: Gemma 4 on Edge
Google just released Gemma 4 — their most capable open model — with day-one LiteRT-LM support:
- Gemma 4 E2B (2B effective params): Runs on phones with ~1.5GB working memory
- Gemma 4 E4B (4B effective params): Better quality, ~3GB working memory
- Agentic capabilities: Function calling, tool use, multi-step reasoning — all on-device
- Apache 2.0: Commercially permissive, no restrictions
- Offline: Zero latency, zero cost, full privacy
From Google’s blog: “In collaboration with Pixel, Qualcomm, and MediaTek, these models run completely offline with near-zero latency across phones, Raspberry Pi, and NVIDIA Jetson Orin Nano.”
Quick Start
No-Code Trial (CLI)
# Install
uv tool install litert-lm
# Run Gemma 4 E2B
litert-lm run \
--from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
gemma-4-E2B-it.litertlm \
--prompt="What is the capital of France?"
Works on Linux, macOS, Windows (WSL), and Raspberry Pi.
Android (Kotlin)
val model = LiteRtLm.load(context, "gemma-4-E2B-it.litertlm")
val response = model.generateResponse("What's the weather like?")
Python
import litert_lm
model = litert_lm.load("gemma-4-E2B-it.litertlm")
response = model.generate("Explain this code:", max_tokens=512)
Google AI Edge Gallery App
Download the AI Edge Gallery app and run models on your phone — no code required.
Supported Models
| Model | Effective Size | Memory | Best For |
|---|---|---|---|
| Gemma 4 E2B | 2B | ~1.5GB | Phones, quick tasks |
| Gemma 4 E4B | 4B | ~3GB | Quality on phones/tablets |
| Gemma 3n E2B | 2B | ~1.2GB | Ultra-lightweight |
| Llama 3.2 3B | 3B | ~2GB | General purpose |
| Phi-4 Mini | 3.8B | ~2.5GB | Reasoning tasks |
| Qwen 2.5 3B | 3B | ~2GB | Multilingual |
Practical Use Cases
1. Offline AI Assistant
Build a personal assistant that works without internet. Gemma 4’s agentic capabilities support function calling on-device.
2. Privacy-First Applications
Medical apps, legal tools, financial advisors — anything where data cannot leave the device.
3. IoT and Embedded
Run AI on Raspberry Pi for smart home automation, industrial monitoring, or edge analytics.
4. Browser-Based AI
LiteRT-LM powers on-device AI in Chrome. No server costs for AI features.
5. Wearables
Powers AI features on Pixel Watch — demonstrating extreme optimization capabilities.
Honest Limitations
- Small models only — targets 1-8B parameters. Don’t expect GPT-5 quality.
- iOS Swift API still in development — Android is first-class.
- Model conversion required — can’t use GGUF files directly.
- Limited model library — ~20 models vs Ollama’s hundreds.
- C++ complexity — building from source is non-trivial.
LiteRT-LM vs Alternatives
| Feature | LiteRT-LM | Ollama | llama.cpp | MLX |
|---|---|---|---|---|
| Mobile | Native | No | Partial | No |
| Browser | Yes | No | Via WASM | No |
| IoT/Pi | Yes | Yes | Yes | No |
| NPU accel | Yes | No | No | No |
| Function calling | Built-in | No | No | No |
| Production use | Chrome, Pixel | Dev tool | Dev tool | Dev tool |
| Model library | ~20 | 100+ | 100+ | 50+ |
Choose LiteRT-LM if: Building for mobile, browser, or IoT with hardware acceleration needs. Choose Ollama if: Simplest local LLM on desktop/server with wide model selection.
FAQ
What is LiteRT-LM?
LiteRT-LM is Google’s open-source inference framework for running LLMs on edge devices. It powers Chrome, Chromebook Plus, and Pixel Watch. Supports Gemma 4, Llama, Phi-4, Qwen across Android, iOS, Web, Desktop, and Raspberry Pi. Apache 2.0.
How is LiteRT-LM different from Ollama?
LiteRT-LM targets mobile/edge with GPU+NPU acceleration and memory-mapped embeddings. Ollama targets desktop/server. LiteRT-LM powers production Google products; Ollama is a developer tool.
Can I run Gemma 4 on my phone?
Yes. Gemma 4 E2B needs ~1.5GB working memory. Download the Google AI Edge Gallery app or use the Kotlin SDK for integration.
Does LiteRT-LM work offline?
Yes, completely. Models run entirely on-device with zero network calls — private, offline, zero-latency inference.
What models does LiteRT-LM support?
Gemma 4 (E2B, E4B), Gemma 3n, Llama 3.2, Phi-4 Mini, Qwen 2.5, and more in .litertlm format from HuggingFace.
GitHub: github.com/google-ai-edge/LiteRT-LM Product Site: ai.google.dev/edge/litert-lm License: Apache 2.0 | Stars: 3,157 | Language: C++