Skip to content

LLM Deploy

ollama/ollama

ollama/ollama· Go

One-command local runner for open-weight LLMs, with auto GPU offload and an OpenAI-compatible API.

GitHub stats

Stars
170,328
Forks
15,878
Watchers
959
Open issues
3,105

meta

License
MIT
Primary language
Go
Last commit
2026-04-29
Stats fetched at
2026-04-29

Ollama wraps llama.cpp (and now its own engine) into a Docker-style CLI: `ollama pull qwen3` then `ollama run` and you're chatting. It handles GGUF model downloads, GPU/CPU offload, quantization variants, and exposes a local REST + OpenAI-compatible API on port 11434. Modelfiles let you bake system prompts, parameters, and adapters into a reusable tag. Works on macOS (Metal), Linux, and Windows.

Editor's verdict

Pick Ollama when you want the fastest path from "I have a Mac/PC" to "I have a working local LLM endpoint" — it's the default for local dev, demos, and Cursor/Continue/Open WebUI backends. Don't use it for production multi-tenant serving: throughput and batching lose badly to vLLM or SGLang, and you pay a tax for the GGUF-only ecosystem. If you need raw tokens/sec on an H100, go vLLM; if you need portability and zero-config, stay here.

Last updated: 2026-04-29

We use cookies

Anonymous analytics help us improve the site. You can opt out anytime. Learn more

ollama/ollama · BuilderWorld