
Vector DB vs PageIndex: Which One Should You Actually Use for RAG?
Every retrieval system I've shipped started the same way: spin up a vector database, embed everything, ship a chatbot. It's the default move in 2026. Reach for Pinecone or pgvector, chunk the docs, cosine-similarity your way to "RAG." It works often enough that nobody questions it.
Then a different camp got loud. PageIndex and the "vectorless RAG" crowd are arguing that embeddings are quietly the wrong tool for a large class of problems — and they've got a benchmark number that makes people stop scrolling. So this is the honest comparison I wish I'd had: vector databases vs PageIndex, what each one actually does, the top services in both camps, and a decision framework for which one to reach for. No hype tax. There's a clear verdict at the end.
The two philosophies in one sentence
A vector database turns your text into numbers (embeddings) and finds chunks whose numbers are geometrically close to your query's numbers. It's search by resemblance.
PageIndex skips embeddings entirely. It builds a hierarchical table-of-contents tree of your document and lets an LLM reason its way to the right section — the way you'd flip to a chapter, scan the headings, and turn to page 47. It's search by navigation.
That's the whole fight: similarity vs reasoning. Everything below is the texture.
How vector databases actually work
The pipeline is mechanical and, honestly, a little dumb in a good way:
- Chunk your documents into pieces (say 200–1,000 tokens).
- Embed each chunk with a model (OpenAI
text-embedding-3, Voyage, Cohere, or an open model) into a vector — a list of a few hundred to a few thousand floats. - Store those vectors in a database with an approximate-nearest-neighbor (ANN) index — usually HNSW or IVF.
- At query time, embed the query, find the top-k closest chunks, stuff them into the prompt.
The magic is that "closeness" in embedding space roughly tracks meaning. "How do I reset my password?" lands near a chunk about credential recovery even if it never says "reset." That semantic recall is genuinely useful, and at scale it's cheap: you pay the embedding cost once at ingestion, and lookups are milliseconds.
The top vector database services
The landscape is crowded. Here's the honest map of who's who and when each earns its place.
SERVICE TYPE BEST FOR
-----------------------------------------------------------------------
Pinecone Managed (SaaS) "Just works." Serverless, zero ops.
Qdrant OSS + Cloud Fast (Rust), great filtering. My default OSS pick.
Weaviate OSS + Cloud Built-in hybrid search + modules.
Milvus/Zilliz OSS + Cloud Billion-scale. Zilliz = managed Milvus.
Chroma OSS + Cloud Best DX for prototyping. Embedded, dead simple.
pgvector Postgres ext. You already run Postgres. Don't add infra.
Turbopuffer Managed Object-storage backed. Absurdly cheap at scale.
LanceDB Embedded/OSS On-disk, multimodal, no server to run.
Redis OSS + Cloud You already run Redis; want low-latency vectors.
MongoDB Atlas Managed Vectors next to your operational documents.
Elastic/OpenS. OSS + Cloud Mature BM25 + vectors = best-in-class hybrid.
Vespa OSS + Cloud Serious ranking/hybrid at scale. Steep curve.A few opinions, since you asked for them:
- Just want it to work and don't care about cost? Pinecone. Serverless, generous free tier, you never think about an index again.
- Already on Postgres? Use pgvector (via Supabase, Neon, or Timescale's
pgvectorscale) before you add a new system to your stack. The best database is the one you're already running. - Want open-source with the best performance-per-dollar? Qdrant. It's written in Rust, the filtering is excellent, and the cloud free tier is enough to ship a real thing.
- Going to billions of vectors? Milvus / Zilliz or Vespa.
- Prototyping on your laptop tonight? Chroma or LanceDB. You'll have retrieval working in ten lines.
- Drowning in scale costs? Look at Turbopuffer — backing vectors with object storage instead of RAM changes the economics dramatically.
On pricing: most of these have a real free tier and then charge for either serverless usage (storage + reads/writes) or provisioned capacity (pods/clusters/RAM). Prices move constantly, so I won't quote figures that'll be wrong by the time you read this — check the pricing page. The bigger cost lever is usually your embedding bill and the RAM your index needs, not the per-query price.
Where vector databases quietly fail
This is the part the tutorials skip. I've been bitten by all of these.
Chunking mutilates meaning. You picked 512 tokens because a blog post told you to. Now a definition is split from its example, a table is cut in half, and "the policy described above" points at a chunk that didn't get retrieved. Garbage boundaries in, garbage context out.
Similarity is not relevance. This is the deep one, and it's PageIndex's whole thesis. Your query expresses intent; the chunk contains content. "What were the risk factors that worsened year over year?" is semantically near every paragraph that says "risk," but relevant to almost none of them. Embeddings give you topical neighbors, not answers.
Cross-references die. "See Appendix G" has zero embedding similarity to Appendix G's actual content. A human follows the pointer; cosine similarity can't.
Lost in the middle. Even when you retrieve the right chunks, stuffing ten of them into a prompt buries the important one in position six, where models reliably pay it the least attention.
It's a system to operate. Re-embedding when the model changes, tuning HNSW parameters, managing the vector store, keeping it fresh as docs update. It's not free; it's deferred.
None of this means vector search is bad. It means it's a blunt instrument — fantastic for "find me roughly relevant stuff across a huge pile," weak for "answer this precisely from this structured document."
Enter PageIndex: reasoning instead of similarity
PageIndex, built by UK startup Vectify AI, is the most prominent "vectorless, reasoning-based RAG" project. Their tagline is blunt: no embeddings, no chunking, no vector DBs. The open-source repo is MIT-licensed and has picked up north of 30,000 GitHub stars, so this is not a toy. (Credit where due — the framing and the benchmark work below are theirs, not mine; I'm just stress-testing the claims.)
Here's the mechanism, in two phases:
1. Tree generation (ingestion). PageIndex parses a document into its natural hierarchy — sections, subsections, page ranges — and produces a JSON tree. Each node carries a title, an ID, a page range, and an LLM-generated summary, with child nodes nested underneath. No fixed-size chunks. The structure mirrors the document's actual table of contents.
0006 Financial Stability (pages 41–58)
├── 0007 Monitoring Vulnerabilities
└── 0008 Domestic & International Cooperation2. Reasoning-based retrieval (query time). Instead of embedding the query, PageIndex hands the tree to an LLM as an in-context index. The model reads the table of contents, decides which node looks most relevant, pulls that section's content, checks whether it's enough to answer, and — if not — navigates somewhere else. It's a tree search driven by reasoning, not a similarity lookup. Vectify likens it to AlphaGo-style search; in practice it's closer to very structured prompting, and it's worth being clear-eyed about that.
The payoff is real: answers trace back to an explicit path — "pages 45–52, Risk Management → Interest Rate Risk." That explainability is the single most compelling thing about the approach. When a vector DB returns chunk #8,412, you can't tell your compliance team why.
The benchmark everyone quotes — read the fine print
You've probably seen "98.7% on FinanceBench." Here's what's actually true, because the details matter:
- The 98.7% belongs to Mafin 2.5, a financial-document model built on top of PageIndex — not the vanilla open-source library. Don't conflate them.
- It was measured on the full FinanceBench benchmark (100% coverage), and Vectify open-sourced the eval code. That's more rigor than most vendor numbers.
- Their comparison table puts Mafin 2.5 at 98.7% against the likes of Fintool (98%, partial coverage), ChatGPT-4o + Search (31%), and Perplexity (45%).
Now the caveats, straight from Vectify's own repo:
- The benchmark "primarily focuses on simple retrieval tasks based on a single document." It does not prove multi-document reasoning or corpus-scale performance.
- The public eval slice is small — on the order of 150 questions over SEC filings.
- The "vector RAG baseline" figure people cite (anywhere from ~30% to ~80% depending on who's talking) is inconsistent across sources, and Vectify's own intro post names no specific baseline. Treat it as marketing-adjacent, not gospel.
So the honest read: PageIndex is genuinely, measurably excellent at precise Q&A over a single long, structured document. The number is real for that task. It is not evidence that you should rip out your vector DB across the board.
PageIndex as a product
- Open source (MIT, Python): builds trees from PDF and Markdown. Complex/scanned PDFs need better OCR than the OSS path provides.
- Hosted cloud: a dashboard, a chat UI, plus a tree-generation API and a Chat & Retrieval API (beta), with Python and JS SDKs.
- MCP server — Claude, Cursor, and agent frameworks can reason over your docs through the official pageindex-mcp integration; a natural fit if you already run an MCP setup.
- Pricing (verified at time of writing): free trial (200 credits / 200 pages), then Standard at $30/mo (1,000 credits, 10k active pages), Pro at $50/mo, and Max at $100/mo (500k pages). Indexing costs 1 credit per page, one-time. Bring your own LLM key and the model calls are on you.
That last line is the catch, and it leads straight into the trade-offs.
Where PageIndex breaks down
I want this to be fair, so here's where the reasoning approach hurts:
Latency and cost per query. Every step is an LLM call. Indexing a ~100-page PDF can be 50–200 model calls; retrieval is multi-step, so you're measuring response times in seconds, not milliseconds, and paying tokens each time. A vector DB pays once at embedding time and then retrieval is effectively free and instant. For a high-QPS chatbot, that difference is the ballgame.
Corpus scale. This is the loudest, most legitimate criticism (it came up hard in the Hacker News threads, and Vectify conceded the point). PageIndex is built to navigate within a document. It has no native answer for "which of my 4 million documents should I even open?" Put a few hundred docs in front of it and it sings; point it at a giant heterogeneous corpus and it has no document-selection mechanism — that's exactly the job a vector index is good at.
It needs structure. Clean headings and a real hierarchy are fuel. Feed it Slack exports, email threads, or support tickets — messy, flat, unstructured — and the table-of-contents premise falls apart.
Full LLM dependency. The OSS path ships your pages to an external model, and retrieval quality rides on LLM-generated summaries being correct. For HIPAA/SOC2 environments, you're looking at the enterprise VPC/on-prem tier, not the free repo.
Don't forget the boring option: keyword search
Lost in the vector-vs-reasoning debate is the tool that predates both and still wins constantly: classic full-text / keyword search (BM25). Elasticsearch / OpenSearch, Postgres full-text, Typesense, and Meilisearch all do it.
When someone searches an exact error code, a SKU, a function name, a person's name, or a legal clause number, embeddings actively hurt — they'll fuzz a precise query into vaguely-related neighbors. Keyword search nails known-item and exact-match lookups, it's cheap, and it's transparent. If your "RAG" problem is really a search problem, you may not need vectors at all.
The middle ground nobody puts on the slide
The real production systems I respect don't pick one religion. They blend:
- Hybrid search — dense vectors plus sparse/BM25 keyword scores, fused. You get semantic recall and exact-match precision. Weaviate, Qdrant, Elastic, and Vespa all do this well; it's the single highest-ROI upgrade to a naive vector setup.
- Rerankers — retrieve 50 candidates cheaply, then have a cross-encoder (Cohere Rerank, Voyage) re-score the top results. This fixes a lot of "similarity ≠ relevance" pain for one extra hop.
- Parent-document / hierarchical retrieval (LlamaIndex, LangChain) — embed small chunks for precision, but return the parent section for context. A pragmatic middle between chunking and PageIndex's whole-section idea.
- Contextual retrieval (Anthropic's technique) — prepend an LLM-generated context blurb to each chunk before embedding, so chunks stop losing their place in the document. Cheap, and it measurably cuts retrieval failures.
- GraphRAG (Microsoft) — build a knowledge graph for questions that need synthesis across many documents.
- Agentic retrieval — let the model issue searches, read, and search again. This is the same instinct as PageIndex, and notably it's roughly how Claude Code searches a codebase: it greps and reads rather than embedding your repo. That's a real industry signal that reasoning-over-structure beats similarity for some domains.
PageIndex itself is best understood as a member of this family, not a replacement for it.
The decision framework
Here's how I'd actually choose, given what I'm building:
IF the corpus is huge / heterogeneous (>100k docs, mixed formats)
→ Vector DB (+ hybrid + reranker). Nothing else scales like this.
IF queries are exact-match (codes, names, IDs, clauses)
→ Keyword/BM25 first. Add vectors only if recall is poor.
IF the job is precise Q&A over long, STRUCTURED docs
(financial filings, contracts, manuals, research papers)
→ PageIndex / reasoning retrieval. This is its home turf.
IF you need to explain WHY an answer was retrieved (audit/compliance)
→ PageIndex's traceable paths beat opaque chunk IDs.
IF you need millisecond latency / high QPS
→ Vector DB. LLM-per-query retrieval can't keep up.
IF content is messy/unstructured (tickets, chat, emails)
→ Vector DB + hybrid. Trees need structure PageIndex won't find.
IF you have BOTH a big corpus AND long structured docs
→ Route it: vector/BM25 to pick the document,
then PageIndex to extract precisely within it. Best of both.That last pattern is the one I'd bet on for serious document AI: a coarse retriever to find the haystack, then reasoning-based navigation to find the needle inside it. Even the PageIndex community lands here.
The verdict
The vector database is not dead, and PageIndex is not snake oil. They solve different shapes of problem, and the "vs" framing oversells the conflict.
- Vector DBs are the backbone. For scale, speed, and heterogeneous piles of content, nothing competes. They should still be your default for most RAG, and you should upgrade them with hybrid search and a reranker before you do anything fancier.
- PageIndex is a scalpel. For precise, explainable Q&A over long, well-structured documents — the financial/legal/technical-manual world — it's genuinely better than embedding search, and the FinanceBench result (with its caveats) backs that up. Reach for it when accuracy and auditability outrank latency and scale.
- Keyword search is the underrated baseline. Try it before you assume you need either.
If I were building a document-AI product today, I wouldn't choose. I'd put a cheap, scalable retriever in front to pick the right documents, and a reasoning-based navigator behind it to read them properly — and I'd measure relentlessly, because the only benchmark that matters is yours.
Pick the tool that fits the shape of your problem, not the one that's trending on your timeline.
Sources & further reading
- PageIndex — product, intro / mechanism, OSS repo, MCP server, docs & pricing
- Mafin 2.5 + FinanceBench numbers and eval code — Vectify's eval repo
- The community debate (worth reading both sides) — Hacker News Show HN and this critical deep-dive
- Vector DB vendors — Pinecone, Qdrant, Weaviate, Milvus/Zilliz, Chroma, Turbopuffer, pgvector
More writing