AI agents · OpenClaw · self-hosting · automation

Quick Answer

How to Run LLMs Locally: Complete Guide for 2026

Published: • Updated:

How to Run LLMs Locally: Complete Guide for 2026

The easiest way to run LLMs locally is with Ollama: install it with one command, then run ollama run llama3.3. You need a computer with at least 8GB RAM for small models, 16GB+ for better models.

Quick Answer

Running LLMs locally in 2026 is surprisingly easy. Here’s the fastest path:

# macOS/Linux
curl -fsSL https://ollama.ai/install.sh | sh
ollama run llama3.3

# Windows
# Download from ollama.com, then:
ollama run llama3.3

That’s it. You’re now running a capable LLM locally with zero API costs and complete privacy.

Step-by-Step Guide

Step 1: Check Your Hardware

ComponentMinimumRecommended
RAM8GB16GB+
Storage10GB free50GB+ SSD
GPUNot requiredNVIDIA/AMD/Apple Silicon

Good news: Modern Macs with Apple Silicon (M1/M2/M3/M4) are excellent for local LLMs—the unified memory architecture means your whole RAM is available for models.

Step 2: Install Ollama

macOS/Linux:

curl -fsSL https://ollama.ai/install.sh | sh

Windows: Download from ollama.com and run the installer.

Docker:

docker run -d -v ollama:/root/.ollama -p 11434:11434 ollama/ollama

Step 3: Download and Run a Model

# Best general-purpose model (8GB RAM needed)
ollama run llama3.3

# Smaller model for limited hardware (4GB RAM)
ollama run phi3

# Coding-focused model
ollama run codellama

# Larger, more capable (16GB+ RAM)
ollama run llama3.3:70b

First run downloads the model (several GB), then you’re chatting instantly.

Step 4: Use the API (Optional)

Ollama runs an OpenAI-compatible API on port 11434:

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.3",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Use this with any tool that supports OpenAI’s API.

Alternative Tools

LM Studio (GUI-based)

  • Download from lmstudio.ai
  • Visual interface for browsing and downloading models
  • Great for exploration, less for development

llama.cpp (Power users)

  • C++ implementation, maximum performance
  • Compile for your exact hardware
  • Most efficient inference

LocalAI (API server)

  • Full OpenAI API compatibility
  • Multiple models at once
  • Production-ready

Best Models to Start With

ModelSizeRAM NeededBest For
Phi-33.8B4GBLow-end hardware
Llama 3.3 8B8B8GBGeneral use
Mistral 7B7B8GBFast responses
Llama 3.3 70B70B48GBMaximum quality
CodeLlama7-34B8-24GBCoding tasks
DeepSeek Coder6.7-33B8-24GBCoding tasks

Add a Chat Interface

Ollama alone is terminal-only. For a ChatGPT-like experience:

# Open WebUI (best option)
docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main

Visit localhost:3000 and connect to your Ollama instance.

Troubleshooting

Model won’t load: Not enough RAM. Try a smaller model. Slow responses: GPU not detected. Check ollama ps for GPU usage. API errors: Ensure Ollama is running (ollama serve).

  • Ollama vs LM Studio: Which should you use?
  • Best self-hosted LLM solutions?
  • What is Ollama?

Last verified: 2026-03-02