AI agents · OpenClaw · self-hosting · automation

Quick Answer

What is Ollama?

Published: • Updated:

What is Ollama?

Ollama is a free, open-source tool that lets you run large language models (LLMs) locally on your computer with a single command. It handles model downloading, optimization, and provides a simple API—making local AI accessible to anyone.

Quick Answer

Think of Ollama as “Docker for LLMs.” Instead of paying for API access or sending data to the cloud, you run AI models directly on your Mac, Windows, or Linux machine. Your data stays private, there are no usage costs, and it works offline.

How It Works

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Run a model
ollama run llama3.2

# That's it - you're chatting with AI locally

Key Features

FeatureDescription
One-line installWorks on Mac, Windows, Linux
100+ modelsLlama, Mistral, Gemma, Phi, CodeLlama, etc.
Automatic optimizationDetects your GPU, optimizes memory
REST APICompatible with OpenAI API format
ModelfileCustomize models, create variants
Offline capableNo internet needed after download

Supported Models (March 2026)

ModelParametersBest For
Llama 48B-405BGeneral purpose
DeepSeek V3.267B-671BCoding, reasoning
Qwen3-Coder7B-72BCode generation
Mistral Large 3123BReasoning
Gemma 32B-27BEfficient tasks
Phi-414BEfficient reasoning
CodeLlama7B-70BProgramming

Hardware Requirements

Model SizeMinimum RAMRecommended
7B models8GB16GB
13B models16GB32GB
34B models32GB64GB
70B models64GB128GB

GPU acceleration: Supported for NVIDIA (CUDA), AMD (ROCm), and Apple Silicon (Metal).

Common Commands

# List available models
ollama list

# Pull a model without running
ollama pull mistral

# Run with a specific prompt
ollama run llama3.2 "Explain quantum computing"

# Start API server
ollama serve

# Create custom model
ollama create my-model -f Modelfile

# Remove a model
ollama rm llama3.2

API Usage

Ollama provides an OpenAI-compatible API:

import requests

response = requests.post('http://localhost:11434/api/generate', json={
    'model': 'llama3.2',
    'prompt': 'What is machine learning?',
    'stream': False
})

print(response.json()['response'])

Or use the OpenAI Python SDK:

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1',
    api_key='ollama'  # Required but not used
)

response = client.chat.completions.create(
    model='llama3.2',
    messages=[{'role': 'user', 'content': 'Hello!'}]
)

Why Use Ollama?

Pros

  • Free forever - No API costs
  • Private - Data never leaves your machine
  • Offline - Works without internet
  • Simple - One command to start
  • Fast - Native GPU acceleration
  • Customizable - Create custom models

Cons

  • ❌ Requires decent hardware
  • ❌ Local models are smaller than frontier APIs
  • ❌ No cloud sync/collaboration
  • ❌ You manage updates

Ollama vs Alternatives

FeatureOllamaLM StudiovLLM
Ease of use⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
GUI❌ (CLI)
Performance⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Model variety⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
API compatibility⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐

Getting Started

  1. Install: curl -fsSL https://ollama.com/install.sh | sh
  2. Run first model: ollama run llama3.2
  3. Chat: Start asking questions
  4. Explore: Try different models from ollama.com/library

Last verified: 2026-03-06