Quick Answer
What is Ollama?
What is Ollama?
Ollama is a free, open-source tool that lets you run large language models (LLMs) locally on your computer with a single command. It handles model downloading, optimization, and provides a simple API—making local AI accessible to anyone.
Quick Answer
Think of Ollama as “Docker for LLMs.” Instead of paying for API access or sending data to the cloud, you run AI models directly on your Mac, Windows, or Linux machine. Your data stays private, there are no usage costs, and it works offline.
How It Works
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Run a model
ollama run llama3.2
# That's it - you're chatting with AI locally
Key Features
| Feature | Description |
|---|---|
| One-line install | Works on Mac, Windows, Linux |
| 100+ models | Llama, Mistral, Gemma, Phi, CodeLlama, etc. |
| Automatic optimization | Detects your GPU, optimizes memory |
| REST API | Compatible with OpenAI API format |
| Modelfile | Customize models, create variants |
| Offline capable | No internet needed after download |
Supported Models (March 2026)
| Model | Parameters | Best For |
|---|---|---|
| Llama 4 | 8B-405B | General purpose |
| DeepSeek V3.2 | 67B-671B | Coding, reasoning |
| Qwen3-Coder | 7B-72B | Code generation |
| Mistral Large 3 | 123B | Reasoning |
| Gemma 3 | 2B-27B | Efficient tasks |
| Phi-4 | 14B | Efficient reasoning |
| CodeLlama | 7B-70B | Programming |
Hardware Requirements
| Model Size | Minimum RAM | Recommended |
|---|---|---|
| 7B models | 8GB | 16GB |
| 13B models | 16GB | 32GB |
| 34B models | 32GB | 64GB |
| 70B models | 64GB | 128GB |
GPU acceleration: Supported for NVIDIA (CUDA), AMD (ROCm), and Apple Silicon (Metal).
Common Commands
# List available models
ollama list
# Pull a model without running
ollama pull mistral
# Run with a specific prompt
ollama run llama3.2 "Explain quantum computing"
# Start API server
ollama serve
# Create custom model
ollama create my-model -f Modelfile
# Remove a model
ollama rm llama3.2
API Usage
Ollama provides an OpenAI-compatible API:
import requests
response = requests.post('http://localhost:11434/api/generate', json={
'model': 'llama3.2',
'prompt': 'What is machine learning?',
'stream': False
})
print(response.json()['response'])
Or use the OpenAI Python SDK:
from openai import OpenAI
client = OpenAI(
base_url='http://localhost:11434/v1',
api_key='ollama' # Required but not used
)
response = client.chat.completions.create(
model='llama3.2',
messages=[{'role': 'user', 'content': 'Hello!'}]
)
Why Use Ollama?
Pros
- ✅ Free forever - No API costs
- ✅ Private - Data never leaves your machine
- ✅ Offline - Works without internet
- ✅ Simple - One command to start
- ✅ Fast - Native GPU acceleration
- ✅ Customizable - Create custom models
Cons
- ❌ Requires decent hardware
- ❌ Local models are smaller than frontier APIs
- ❌ No cloud sync/collaboration
- ❌ You manage updates
Ollama vs Alternatives
| Feature | Ollama | LM Studio | vLLM |
|---|---|---|---|
| Ease of use | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| GUI | ❌ (CLI) | ✅ | ❌ |
| Performance | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Model variety | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| API compatibility | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
Getting Started
- Install:
curl -fsSL https://ollama.com/install.sh | sh - Run first model:
ollama run llama3.2 - Chat: Start asking questions
- Explore: Try different models from ollama.com/library
Related Questions
Last verified: 2026-03-06