Skip to content

Intro★★★★★6 min read

What is a vector database, and do you actually need one?

A vector database stores embeddings and finds similar ones fast. For most starting RAG apps, your existing Postgres with pgvector is enough — and a dedicated vector DB might be over-engineering.

A vector database is a system optimized for storing and searching high-dimensional vectors — the embeddings produced by AI models. It's the storage layer of any RAG, semantic search, or recommendation system. The hype around "you need a vector DB" is mostly justified, but the choice between a dedicated vector DB and just using Postgres with pgvector is more nuanced than the marketing implies.

What a vector DB actually does

The core job: given millions of vectors, find the K nearest to a query vector — fast.

Brute force is O(n) — compare the query to every stored vector. Fine at 10,000 records (milliseconds). Painful at 1,000,000 (seconds). Impossible at 100,000,000.

Vector DBs use approximate nearest neighbor (ANN) indexes — typically HNSW (Hierarchical Navigable Small World) or IVF (Inverted File Index) — to get sub-linear search. Trade some accuracy (you might miss the truly nearest neighbor 1-5% of the time) for massive speedup. For semantic search, that tradeoff is almost always worth it.

On top of search, modern vector DBs offer:

  • Metadata filtering — "find similar vectors where category = 'docs' and updated_at > 2025-01-01"
  • Hybrid search — combine vector similarity with keyword (BM25) scores
  • Multi-tenancy — isolate vectors by user/team/workspace
  • Reranking integration — feed top-K to a reranker model for better order
  • Persistence and backup — durable storage, replication

The 2026 landscape

The options break into three groups:

Postgres extensions:

  • pgvector — the default. Open-source extension, runs in any Postgres (including Supabase, Neon, RDS). Up to ~10M rows comfortably. Same DB as your app data, no separate ops.
  • pgvectorscale — Timescale's extension on top of pgvector, adds StreamingDiskANN index for billion-scale.

Dedicated managed:

  • Pinecone — original cloud vector DB, mature, scales to billions, $70/mo starter.
  • Weaviate Cloud — open-source core with managed offering, modular schemas.
  • Qdrant Cloud — Rust-based, fast, generous free tier.
  • Milvus / Zilliz Cloud — large-scale, more complex.

Embedded / lightweight:

  • Chroma — embedded Python library, dev-friendly. Good for prototypes, less so for production.
  • DuckDB / SQLite vector extensions — embedded, single-file, great for offline or edge.
  • LanceDB — columnar vector storage, fast scanning, growing in popularity.

How to actually choose

A decision tree that works for most people:

  1. Are you starting out, with < 10M vectors, and already using Postgres? Use pgvector. You keep one DB, ship faster, and most apps never outgrow it.
  2. Do you need 50M+ vectors with strict latency SLAs and zero ops budget? Use Pinecone or Qdrant Cloud.
  3. Do you self-host everything and want open-source? Qdrant or Weaviate.
  4. Are you on the edge, mobile, or local-first? SQLite vector extension or LanceDB.
  5. Are you embedding everything in a Python app for prototyping? Chroma is the lowest-friction choice.

The over-engineering trap: founders building their first RAG app pick Pinecone because it's the "AI-native" choice, then discover their app would have been simpler with pgvector inside the same Postgres they already had. Most people regret this in the second month.

Real-world performance numbers

To set expectations (2026 generation hardware):

  • pgvector with HNSW on a Supabase Pro tier: 100K vectors at 1,536 dims → 5-20ms p95 query latency, $25/mo (the Supabase plan, not the vector DB itself).
  • pgvector on the same hardware at 10M vectors: 30-100ms p95.
  • Pinecone serverless at 10M vectors: 20-50ms p95, but ~$70-200/mo depending on usage.
  • Qdrant self-hosted on a $20/mo VPS: 100K vectors → 2-10ms.

The pattern: pgvector is fast enough for ~95% of apps, and the gap to dedicated DBs only matters at scale or under tight latency SLAs.

What about the metadata problem

A common pitfall: you embed documents, store them in a vector DB, then realize you need to filter by tenant_id, created_at, category. Some vector DBs handle this elegantly; others don't.

  • pgvector inherits Postgres's full SQL: any WHERE clause works perfectly. Best metadata story.
  • Pinecone, Qdrant, Weaviate: have their own metadata filter syntaxes, often slightly limited.
  • Chroma: simple metadata filtering, works for basic cases.

If your retrieval needs structured filters more than "top-K vectors with no constraints," pgvector tends to win.

When NOT to use a vector DB at all

  • You have < 1,000 documents. Skip the vector store entirely. Embed at request time and brute-force-compare in memory, or just put all docs in the LLM's context window.
  • Your queries are mostly exact match. Use Postgres FTS (full-text search) or BM25; vector overkill.
  • Your data fits in RAM and stays small. A Python dict and numpy.argmax is sometimes the right answer.
  • You're not at the retrieval-quality wall yet. If your bottleneck is bad chunking or wrong embedding model, a fancier DB doesn't fix it.

Migration cost

A practical concern: switching vector DBs is annoying but not catastrophic. Re-embedding is usually unnecessary — you export vectors + metadata, import to the new DB. Most teams do this once, when scale demands it. Plan to start with pgvector and migrate if/when needed; don't pre-optimize.

Further reading

  • What is an embedding
  • What is RAG (Retrieval-Augmented Generation)
  • How to pick a vector database (Pinecone vs pgvector vs Qdrant)
  • Hybrid search (BM25 + vector) for RAG systems
  • How to build your first RAG stack (a 2026 pick guide)

Last updated: 2026-04-29

We use cookies

Anonymous analytics help us improve the site. You can opt out anytime. Learn more