Ollama
Run Local LLMs with Ollama
The easiest way to run LLMs locally. One command to install, one command to run any model.
macoslinuxwindows
Install Ollama
Ollama packages everything you need into a single binary. It manages model downloads, GPU detection, and serves an OpenAI-compatible API automatically.
curl -fsSL https://ollama.com/install.sh | shPull and Run a Model
Download any model with a single command. Ollama automatically uses your GPU if available and falls back to CPU.
# Pull a model (downloads once, cached locally)
ollama pull llama3.1:8b
# Run interactively
ollama run llama3.1:8b
# Or pull and run in one step
ollama run mistral:7bStart the API Server
Ollama exposes an OpenAI-compatible API at http://localhost:11434/v1. You can use this with any tool that supports the OpenAI API format, including OpenClaw.
# Start the server (runs in background)
ollama serve
# Test the API
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.1:8b",
"messages": [{"role": "user", "content": "Hello!"}]
}'GPU Requirements
Ollama auto-detects NVIDIA, AMD, and Apple Silicon GPUs. For 7B models, you need ~5 GB VRAM. For 13B+ models, 10+ GB VRAM is recommended. CPU-only mode works but is significantly slower.