Run Local LLMs with Ollama

Install Ollama

Ollama packages everything you need into a single binary. It manages model downloads, GPU detection, and serves an OpenAI-compatible API automatically.

curl -fsSL https://ollama.com/install.sh | sh

Pull and Run a Model

Download any model with a single command. Ollama automatically uses your GPU if available and falls back to CPU.

# Pull a model (downloads once, cached locally)
ollama pull llama3.1:8b

# Run interactively
ollama run llama3.1:8b

# Or pull and run in one step
ollama run mistral:7b

Start the API Server

Ollama exposes an OpenAI-compatible API at http://localhost:11434/v1. You can use this with any tool that supports the OpenAI API format, including OpenClaw.

# Start the server (runs in background)
ollama serve

# Test the API
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1:8b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

GPU Requirements

Ollama auto-detects NVIDIA, AMD, and Apple Silicon GPUs. For 7B models, you need ~5 GB VRAM. For 13B+ models, 10+ GB VRAM is recommended. CPU-only mode works but is significantly slower.

Install Ollama

Pull and Run a Model

Start the API Server

GPU Requirements

Recommended Models for Ollama

Llama 3.1 8B

Mistral 7B v0.3

Qwen 2.5 7B

Phi-4 Mini 3.8B

Gemma 3 12B