How to Run LLMs Locally: Complete Guide for 2026
How to Run LLMs Locally: Complete Guide for 2026
The easiest way to run LLMs locally is with Ollama: install it with one command, then run ollama run llama3.3. You need a computer with at least 8GB RAM for small models, 16GB+ for better models.
Quick Answer
Running LLMs locally in 2026 is surprisingly easy. Here’s the fastest path:
# macOS/Linux
curl -fsSL https://ollama.ai/install.sh | sh
ollama run llama3.3
# Windows
# Download from ollama.com, then:
ollama run llama3.3
That’s it. You’re now running a capable LLM locally with zero API costs and complete privacy.
Step-by-Step Guide
Step 1: Check Your Hardware
| Component | Minimum | Recommended |
|---|---|---|
| RAM | 8GB | 16GB+ |
| Storage | 10GB free | 50GB+ SSD |
| GPU | Not required | NVIDIA/AMD/Apple Silicon |
Good news: Modern Macs with Apple Silicon (M1/M2/M3/M4) are excellent for local LLMs—the unified memory architecture means your whole RAM is available for models.
Step 2: Install Ollama
macOS/Linux:
curl -fsSL https://ollama.ai/install.sh | sh
Windows: Download from ollama.com and run the installer.
Docker:
docker run -d -v ollama:/root/.ollama -p 11434:11434 ollama/ollama
Step 3: Download and Run a Model
# Best general-purpose model (8GB RAM needed)
ollama run llama3.3
# Smaller model for limited hardware (4GB RAM)
ollama run phi3
# Coding-focused model
ollama run codellama
# Larger, more capable (16GB+ RAM)
ollama run llama3.3:70b
First run downloads the model (several GB), then you’re chatting instantly.
Step 4: Use the API (Optional)
Ollama runs an OpenAI-compatible API on port 11434:
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.3",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Use this with any tool that supports OpenAI’s API.
Alternative Tools
LM Studio (GUI-based)
- Download from lmstudio.ai
- Visual interface for browsing and downloading models
- Great for exploration, less for development
llama.cpp (Power users)
- C++ implementation, maximum performance
- Compile for your exact hardware
- Most efficient inference
LocalAI (API server)
- Full OpenAI API compatibility
- Multiple models at once
- Production-ready
Best Models to Start With
| Model | Size | RAM Needed | Best For |
|---|---|---|---|
| Phi-3 | 3.8B | 4GB | Low-end hardware |
| Llama 3.3 8B | 8B | 8GB | General use |
| Mistral 7B | 7B | 8GB | Fast responses |
| Llama 3.3 70B | 70B | 48GB | Maximum quality |
| CodeLlama | 7-34B | 8-24GB | Coding tasks |
| DeepSeek Coder | 6.7-33B | 8-24GB | Coding tasks |
Add a Chat Interface
Ollama alone is terminal-only. For a ChatGPT-like experience:
# Open WebUI (best option)
docker run -d -p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui \
ghcr.io/open-webui/open-webui:main
Visit localhost:3000 and connect to your Ollama instance.
Troubleshooting
Model won’t load: Not enough RAM. Try a smaller model.
Slow responses: GPU not detected. Check ollama ps for GPU usage.
API errors: Ensure Ollama is running (ollama serve).
Related Questions
- Ollama vs LM Studio: Which should you use?
- Best self-hosted LLM solutions?
- What is Ollama?
Last verified: 2026-03-02