Build a personal RAG over your notes in an afternoon

Most note-taking systems get worse as they get bigger. You write 2000 notes in Notion or Obsidian or Apple Notes and finding anything turns into archaeology. A personal RAG fixes this — ask a question in natural language, get an answer pulled from your notes with the source quote attached.

This is a small project, not an enterprise system. The goal is something useful by tomorrow morning, not a production-grade tool. Skip half the things bigger RAG tutorials tell you to do.

What you'll build

A script that ingests your notes (markdown files, exported Notion / Obsidian / Apple Notes)
An embedding step that vectorizes each chunk
A vector store (SQLite + sqlite-vec, the simplest option that works)
A query CLI: ask "when did I write about that meeting with the contractor?" → answer with sources
Optional: a tiny web UI on top

Full stack: Python or TypeScript, ~150 lines of code, free running locally except embedding API costs (which for 2000 notes is roughly $0.10).

Step 1: get your notes into one folder

The simplest format is markdown. Export workflows for the major apps:

Notion — export as Markdown & CSV, unzip the result
Obsidian — already markdown
Apple Notes — use Bear or Apple's export (limited; Bear's export is better)
Google Docs — File → Download → Markdown
Plain text journal — already done

Dump everything in ~/notes/ (or wherever). One file per note. Use the file path as a unique ID later.

Don't preprocess. Don't deduplicate. Don't try to extract "only useful notes." Let the retrieval do filtering at query time.

Step 2: chunk and embed

Use a recursive chunker with ~500 tokens per chunk and 50 token overlap. Smaller chunks (200-300 tokens) actually work better for personal notes than the bigger chunks tutorials suggest, because notes tend to contain self-contained thoughts.

Embedding model: OpenAI text-embedding-3-small is the fastest path. Cost for 2000 average-length notes is about $0.05-0.20 total. Alternatively run BGE M3 locally with sentence-transformers if you want zero API spend.

Store each embedding in SQLite using sqlite-vec. The schema looks like:

CREATE TABLE chunks (
  id INTEGER PRIMARY KEY,
  file_path TEXT,
  chunk_text TEXT,
  chunk_index INTEGER
);
CREATE VIRTUAL TABLE chunk_vectors USING vec0(
  embedding FLOAT[1536]
);

Run ingestion once. Re-run it when you add a meaningful number of new notes (or set up a cron / git hook).

Step 3: query

The query flow:

User asks a question in natural language
Embed the question with the same model
Find the top-10 closest chunks via cosine similarity
Optionally rerank with a small cross-encoder (Cohere Rerank or BGE reranker)
Pass the top-5 chunks plus the question to an LLM
Return the answer plus citations (file paths)

For the LLM, Claude 4.5 Haiku or GPT-5 mini are both fine; cost per query is fractions of a cent.

The system prompt: "You are a helpful assistant answering questions based only on the user's notes provided in the context. Always cite the file path of each source. If the notes don't contain the answer, say so clearly. Do not invent information."

Step 4: a tiny CLI

# pseudo-code
def ask(question):
    q_vec = embed(question)
    chunks = sqlite_vec_query(q_vec, top_k=10)
    reranked = rerank(question, chunks)[:5]
    answer = claude.complete(
        system=SYSTEM_PROMPT,
        user=f"Question: {question}\n\nNotes:\n{format_chunks(reranked)}"
    )
    print(answer)

That's the whole product. Ship it as a CLI for a week. If you find yourself reaching for it daily, then bother with a UI.

What to skip

Don't add a vector DB beyond SQLite for personal scale. SQLite + sqlite-vec handles 100k chunks with < 100ms query time. You don't need Qdrant or Pinecone.
Don't fine-tune anything. Embeddings out-of-the-box are fine for personal notes.
Don't build a complex evaluation framework. You'll know if it's good by using it.
Don't deploy this to a server. Run it locally. Your notes are private.
Don't add a chat history feature in V1. One-shot Q&A is more useful than you'd guess.
Don't rely on agents. A direct retrieve → generate flow is better for note-search than an agentic loop.

What to add later if you actually use it

Hybrid search — add full-text search via SQLite FTS5 alongside vector search. Hybrid retrieval substantially improves recall, especially for specific names and dates.
Date / metadata filtering — "what did I write about X in November 2025?" requires extracting dates from filenames or note headers.
Cross-app sources — feed in Slack messages, emails, calendar events. Diminishing returns; notes are usually 80% of value.
A web UI — if you genuinely use this daily, a small web app makes it nicer than CLI. But CLI is enough to start.

When NOT to build a personal RAG

If you have under ~200 notes total, just use search-by-filename and grep. RAG adds complexity that pays off only at meaningful scale.

If your notes are short snippets (< 50 words each), embedding-based retrieval doesn't add much over keyword search. Just use ripgrep.

If your notes are highly structured (a database of book quotes, all in the same format), use proper SQL filtering. The natural language layer adds nothing.

If you don't trust LLMs around your private notes, don't build this. The notes go to whichever model API you use, and even with privacy guarantees it's data leaving your machine. Run BGE M3 + Llama locally if this matters; otherwise reconsider.

What this changes

What surprised me when I started using a personal RAG was the kind of question I'd ask. Not "find me that note" — that's just search. Real questions: "what did past me think about whether to start a company?" "What were the recurring patterns in my journal during the breakup?" "How has my opinion on X evolved over the last three years?"

These are questions you couldn't ask before because the answer was distributed across 50 notes you'd never re-read in sequence. Personal RAG makes them tractable.

Next steps

Read about chunk size for retrieval — 500 tokens is a default, your notes might want different
Look at sqlite-vec specifically — the simplest production-ready embedded vector store
Try a hybrid search upgrade once basic version works
Read about evaluation if you start using this for important questions