A vector database is a system optimized for storing and searching high-dimensional vectors — the embeddings produced by AI models. It's the storage layer of any RAG, semantic search, or recommendation system. The hype around "you need a vector DB" is mostly justified, but the choice between a dedicated vector DB and just using Postgres with pgvector is more nuanced than the marketing implies.
What a vector DB actually does
The core job: given millions of vectors, find the K nearest to a query vector — fast.
Brute force is O(n) — compare the query to every stored vector. Fine at 10,000 records (milliseconds). Painful at 1,000,000 (seconds). Impossible at 100,000,000.
Vector DBs use approximate nearest neighbor (ANN) indexes — typically HNSW (Hierarchical Navigable Small World) or IVF (Inverted File Index) — to get sub-linear search. Trade some accuracy (you might miss the truly nearest neighbor 1-5% of the time) for massive speedup. For semantic search, that tradeoff is almost always worth it.
On top of search, modern vector DBs offer:
- Metadata filtering — "find similar vectors where category = 'docs' and updated_at > 2025-01-01"
- Hybrid search — combine vector similarity with keyword (BM25) scores
- Multi-tenancy — isolate vectors by user/team/workspace
- Reranking integration — feed top-K to a reranker model for better order
- Persistence and backup — durable storage, replication
The 2026 landscape
The options break into three groups:
Postgres extensions:
- pgvector — the default. Open-source extension, runs in any Postgres (including Supabase, Neon, RDS). Up to ~10M rows comfortably. Same DB as your app data, no separate ops.
- pgvectorscale — Timescale's extension on top of pgvector, adds StreamingDiskANN index for billion-scale.
Dedicated managed:
- Pinecone — original cloud vector DB, mature, scales to billions, $70/mo starter.
- Weaviate Cloud — open-source core with managed offering, modular schemas.
- Qdrant Cloud — Rust-based, fast, generous free tier.
- Milvus / Zilliz Cloud — large-scale, more complex.
Embedded / lightweight:
- Chroma — embedded Python library, dev-friendly. Good for prototypes, less so for production.
- DuckDB / SQLite vector extensions — embedded, single-file, great for offline or edge.
- LanceDB — columnar vector storage, fast scanning, growing in popularity.
How to actually choose
A decision tree that works for most people:
- Are you starting out, with < 10M vectors, and already using Postgres? Use pgvector. You keep one DB, ship faster, and most apps never outgrow it.
- Do you need 50M+ vectors with strict latency SLAs and zero ops budget? Use Pinecone or Qdrant Cloud.
- Do you self-host everything and want open-source? Qdrant or Weaviate.
- Are you on the edge, mobile, or local-first? SQLite vector extension or LanceDB.
- Are you embedding everything in a Python app for prototyping? Chroma is the lowest-friction choice.
The over-engineering trap: founders building their first RAG app pick Pinecone because it's the "AI-native" choice, then discover their app would have been simpler with pgvector inside the same Postgres they already had. Most people regret this in the second month.
Real-world performance numbers
To set expectations (2026 generation hardware):
- pgvector with HNSW on a Supabase Pro tier: 100K vectors at 1,536 dims → 5-20ms p95 query latency, $25/mo (the Supabase plan, not the vector DB itself).
- pgvector on the same hardware at 10M vectors: 30-100ms p95.
- Pinecone serverless at 10M vectors: 20-50ms p95, but ~$70-200/mo depending on usage.
- Qdrant self-hosted on a $20/mo VPS: 100K vectors → 2-10ms.
The pattern: pgvector is fast enough for ~95% of apps, and the gap to dedicated DBs only matters at scale or under tight latency SLAs.
What about the metadata problem
A common pitfall: you embed documents, store them in a vector DB, then realize you need to filter by tenant_id, created_at, category. Some vector DBs handle this elegantly; others don't.
- pgvector inherits Postgres's full SQL: any WHERE clause works perfectly. Best metadata story.
- Pinecone, Qdrant, Weaviate: have their own metadata filter syntaxes, often slightly limited.
- Chroma: simple metadata filtering, works for basic cases.
If your retrieval needs structured filters more than "top-K vectors with no constraints," pgvector tends to win.
When NOT to use a vector DB at all
- You have < 1,000 documents. Skip the vector store entirely. Embed at request time and brute-force-compare in memory, or just put all docs in the LLM's context window.
- Your queries are mostly exact match. Use Postgres FTS (full-text search) or BM25; vector overkill.
- Your data fits in RAM and stays small. A Python dict and
numpy.argmaxis sometimes the right answer. - You're not at the retrieval-quality wall yet. If your bottleneck is bad chunking or wrong embedding model, a fancier DB doesn't fix it.
Migration cost
A practical concern: switching vector DBs is annoying but not catastrophic. Re-embedding is usually unnecessary — you export vectors + metadata, import to the new DB. Most teams do this once, when scale demands it. Plan to start with pgvector and migrate if/when needed; don't pre-optimize.
Further reading
- What is an embedding
- What is RAG (Retrieval-Augmented Generation)
- How to pick a vector database (Pinecone vs pgvector vs Qdrant)
- Hybrid search (BM25 + vector) for RAG systems
- How to build your first RAG stack (a 2026 pick guide)