Proxy-Pointer RAG: Reaching Vectorless Accuracy at Vector RAG Scale and Price

launch of PageIndex lately, is a part of a broader shift in AI structure towards “Vectorless RAG” or “Reasoning-Based mostly Retrieval.” As an alternative of the usual technique of splitting paperwork into random chunks and looking through mathematical similarity, PageIndex builds a “Sensible Desk of Contents” (a hierarchical tree) that enables LLMs to navigate paperwork like a human professional would. Quite a few blogs (together with this one from Microsoft), define the working rules (no vector database, no chunking, enhanced explainability) together with 98.7% accuracy achieved on a monetary benchmark. Nevertheless, they’re additionally cautious to notice that Vectorless RAG is greatest fitted to deep-dive queries on advanced structured or semi-structured paperwork (equivalent to monetary statements), than looking throughout many impartial paperwork, equivalent to buyer help data bases, the place we must always proceed to make use of Vector RAG.

Why is that?

If Vectorless RAG utilizing PageIndex supplies higher (or a minimum of nearly as good) outcomes on virtually any question, why not use it for a big assortment of paperwork. The first purpose is that PageIndex’s tree-based strategy can not virtually scale to multi-document situations. The hierarchical tree index that may be a pre-requisite ingestion step is sluggish and costly to construct utilizing a LLM. Moreover, the retrieval is a 2 step course of: use an LLM to stroll the tree, find essentially the most related nodes, then use the content material of these nodes as context for the response synthesis step utilizing the LLM.

As compared, constructing a vector index is quick and cheap, and the retrieval step makes use of a LLM solely as soon as in the course of the synthesis step. Additionally, Ingestion utilizing an embedding mannequin prices a lot lower than summarization of the total doc by an LLM.

What should you might get the wonderful structure-aware reasoning accuracy of Vectorless RAG, together with the low latency and value of a Vector RAG, in a approach that’s scalable throughout the enterprise database? On this article, I’ll stroll via an actual use case on a big, advanced doc to construct Proxy-Pointer RAG—an ingestion and retrieval pipeline that achieves this via a set of novel engineering steps. Alongside the best way, we’ll discover and display the next:

Why precisely is PageIndex so correct? And why it’s tough to virtually scale the idea to multi-document data bases.
A fast comparability of Vectorless RAG utilizing PageIndex vs Flat Vector RAG to determine a baseline.
How can we incorporate the rules of PageIndex right into a Vector index with not one of the related latency and price?
Comparability of all kinds of queries utilizing PageIndex and Proxy-Pointer to check the standard of retrievals.

Use Case Setup

We’ll use a World Financial institution report named South Asia Improvement Replace, April 2024: Jobs for Resilience (License: CC BY 3.0 IGO). It is a 131 web page report comprising a number of chapters, advanced charts, tables, content material in packing containers and many others. and is an effective candidate for PageIndex to show its functionality. I’ve used gemini-3-flash because the LLM to construct the pageindex tree and gemini-3.1-flash-lite for retrievals. I extracted the report pdf to a markdown file utilizing the Adobe PDF Extract API, however some other technique equivalent to utilizing a VLM which preserves the integrity of the tables, charts and many others would work simply as effectively. For vector database, FAISS is used.

How does PageIndex work?

As an alternative of the “chunk your doc, embed the chunks, retrieve the top-Okay, feed them to an LLM” pipeline of a Vector RAG, PageIndex takes a radically totally different strategy to doc retrieval. As an alternative of treating a doc as a flat sequence of chunks, it builds a semantic skeleton tree — a hierarchical map of each part, sub-section, and content material block within the doc — after which makes use of an LLM to navigate that tree at question time.

Part 1: Indexing(as soon as per doc)

PageIndex parses the doc’s heading construction (Markdown headers, PDF outlines, and many others.) right into a nested tree. Every node will get:

A title (extracted from the heading)
A node ID (distinctive identifier like 0012)
Line boundaries (begin and finish line within the supply doc)
A abstract (generated by an LLM — that is the costly and time-consuming half)

The result’s a JSON that appears like this:

{
  "node_id": "0011",
  "title": "Chapter 1. Misleading Power",
  "abstract": "Covers South Asia's development outlook, inflation traits, monetary vulnerabilities, local weather dangers, and coverage challenges...",
  "line_num": 621,
  "nodes": [
    {
      "node_id": "0012",
      "title": "Introduction",
      "summary": "Summarizes the chapter's key themes including regional growth driven by India...",
      "line_num": 625
    },
    ...
  ]
}

Part 2: Retrieval (Per Question)

When a person asks a query, PageIndex palms the total tree of summaries to an LLM and says, “Which nodes comprise the reply?”. That is not like a Vector RAG which depends on mathematical similarity between question and chunk embeddings to construct the related context.

The LLM reads the summaries — not the total textual content — and returns a brief listing of node IDs. PageIndex then makes use of the road boundaries to slice the precise, contiguous, full part from the unique markdown file and passes it to the synthesis LLM.

Why this works so effectively?

PageIndex excels due to three architectural benefits:

1. Structural Navigation, Not Sample Matching

After we ask “What are the primary messages of Chapter 1?”, PageIndex doesn’t seek for chunks containing these phrases. It reads the abstract of node 0011 (“Chapter 1. Misleading Power”) which says “Covers development outlook, inflation, monetary vulnerabilities, local weather dangers, and coverage challenges” — and instantly is aware of that is the precise node. It causes about relevance, not semantic and lexical similarity.

2. Contiguous Context Extraction

As soon as the precise nodes are recognized, PageIndex extracts the total, unbroken part that the node represents, from the unique Markdown — headers, sub-headers, bullet factors, determine references, and all. The synthesis LLM receives context that reads like a correctly authored doc part, not a fragmented chunk with arbitrary boundaries.

3. Zero Chunk Boundary Artifacts

There aren’t any overlapping chunks, no break up sentences, no context home windows that begin mid-paragraph. Each piece of context has a pure starting (the part header) and a pure finish (the following part’s begin). This dramatically reduces hallucination from ambiguous context.

Nevertheless, this power comes at a price—and that is the place the strategy begins to interrupt down.

Why that is tough to scale?

The explanations are Price and Latency (ingestion in addition to retrieval time). For our 131-page report, we get 137 structural nodes within the tree and due to this fact, PageIndex wants 137 LLM calls only for the indexing part — one abstract per node. For a corpus of fifty paperwork, that’s ~7,000 LLM calls earlier than we have now used it to reply a single query.

Here’s a comparative value of PageIndex vs Vector Index:

Metric	PageIndex	Normal Vector RAG
Indexing LLM calls	~137 per doc	0
Indexing time	5-10 minutes per doc	30 seconds per doc
Indexing value	$$$ (LLM API charges)	$ (embedding API solely)
Multi-document help	Requires per-document tree traversal throughout retrieval	Single unified index, quick and environment friendly
Retrieval high quality	★★★★★	★★★☆☆

For a single crucial doc equivalent to a monetary report, this value is definitely justified. For a 500-document enterprise data base on customer support SOPs that updates weekly? It’s a non-starter.

That is the place PageIndex begins changing into impractical for actual programs at scale.

Earlier than proposing an alternate, let’s set up a baseline: how a lot better is PageIndex in comparison with customary Vector RAG?

Comparability of Vectorless vs Flat Vector RAG

To ascertain a baseline, I in contrast the outcomes of a various vary of queries utilizing Vectorless and flat Vector RAG. As anticipated, the standard of Vectorless RAG was higher in all circumstances. Listed below are two examples:

Question about desk content material

Right here is the response from Vectorless RAG:

=== VECTORLESS RAG INTERACTION ===
Query: what does annexure desk a2.1.1 say
Nodes Retrieved: 0125, 0131
Response:
Annex Desk A2.1.1, titled "Research included within the literature overview," supplies a complete listing of the educational and coverage papers utilized to estimate the expansion elasticity of employment (Annex Determine A2.1.1; Annex 2.1).

The desk catalogs 25 particular research, detailing the next data for every:
*   **Paper:** The writer(s) and publication supply.
*   **Interval of principal pattern:** The timeframe coated by the research (starting from the Nineteen Sixties via 2019).
*   **Pattern protection:** The geographical or organizational scope (e.g., Worldwide, Europe, South Asia, Africa, Asia).
*   **Area:** The precise area analyzed.
*   **Economies:** The precise nations included within the research pattern.

In accordance with the offered textual content, the articles listed on this desk have been chosen primarily based on particular standards: they concentrate on development elasticity of employment utilizing country-level estimates. The choice excludes articles that current completely subnational, gender-specific, or business/sector-specific estimates, in addition to these targeted solely on Okun's legislation and unemployment elasticity of development (Annex 2.1).

And the identical utilizing Vector RAG, which fully misses the content material

=== VECTOR RAG INTERACTION ===
Query: what does annexure desk a2.1.1 say
Retrieved 5 chunks.
Response:
Based mostly on the offered context, there is no such thing as a data out there describing the contents of a desk labeled "A2.1.1."

The paperwork listing "A2.1.1" within the desk of contents as **"Research included within the literature overview"** (web page 93). Nevertheless, the detailed content material offered in your search outcomes pertains to different tables, particularly:

*   **Annex 2.1:** Describes a abstract of the literature on the expansion elasticity of employment, noting that "an intensive literature search was carried out of research on the elasticity of employment with respect to output development."
*   **Annex Determine A2.1.1 (referenced as A2..1.11.1 within the textual content):** Titled "Estimates of output elasticity of employment," which states, "The literature supplies a variety of estimates on the expansion elasticity of whole employment."
*   **Annex Desk A2.3.3:** Gives particular regression knowledge relating to the robustness checks for the feminine employment ratio.

There isn't a particular knowledge or content material offered for Annex Desk A2.1.1 past its title within the desk of contents.

Question requiring Structural data of the doc

The question is what are the questions answered by chapter 2? Right here is the response from Vectorless RAG. It exactly situated Chapter 2 and the part of “Questions” inside that and printed it verbatim, together with the grounding data for explainability.

=== VECTORLESS RAG INTERACTION ===
Query: what are the questions answered by chapter 2
Nodes Retrieved: 0098
Response:
Based mostly on the offered excerpts, Chapter 2 addresses the next questions:

*   **How does South Asia’s employment evaluate with that of different rising market and growing economies (EMDEs)?**
*   **What are the correlates of long-run employment ratios amongst EMDEs?**
*   **How can South Asia create the roles wanted to soak up its rising working-age inhabitants, whereas additionally boosting productiveness?**

These questions are explicitly listed within the part titled **"Questions."**

The chunks in Vector RAG don’t keep the structural hierarchy and part integrity, so whereas it locates the “Questions” part, it isn’t capable of finding the particular questions in it.

=== VECTOR RAG INTERACTION ===
Query: what are the questions answered by chapter 2
Retrieved 5 chunks.
Response:
Based mostly on the offered doc, Chapter 2—titled "Jobless Improvement"—is structured to handle particular inquiries relating to labor markets and financial development.

The chapter explicitly identifies a "Questions" part (web page 65) to border its evaluation. Whereas the doc doesn't listing the particular textual content of those questions, it signifies that the chapter covers the next thematic areas and analytical elements:

*   **Traits and Productiveness:** The chapter examines "Traits in employment and labor productiveness" (web page 67) and addresses "Labor productiveness development" (web page 73).
*   **Employment Ratios:** A good portion of the chapter is devoted to "Lengthy-run employment ratios" (web page 75) and their numerous "Correlates," together with:
    *   **Coverage and Structural Components:** Tax regimes, coverage uncertainty (web page 77), and commerce openness (web page 79).
    *   **Market and Monetary Components:** Entry to finance (web page 81), in addition to product, labor, and land markets (web page 82).
    *   **Human Capital and Agency Traits:** Schooling (web page 83) and institution dimension (web page 78).
*   **Progress Dynamics:** The chapter explores the "Engines of output development in EMDEs" (web page 73) and the "Sectoral employment composition" (web page 72).
*   **Coverage Implications:** The analysis concludes by addressing "Coverage choices to spice up employment" (web page 85).

So the query turns into: can we retain these benefits with out paying the price?

Engineering a Higher Retriever — Proxy-Pointer RAG

Let’s construct Proxy-Pointer RAG to reply that query. The core perception is intuitive and as follows:

You don’t want LLM summaries to offer a vector database structural consciousness. You simply have to encode the construction into the embeddings themselves.

The system makes use of the identical structural tree utilizing PageIndex — however with out the costly summarization flag set. Constructing this skeletal tree requires no costly LLM calls throughout indexing. The tree is constructed purely from regex-based heading detection, which runs in milliseconds.

Then, as an alternative of asking an LLM to navigate the tree, we let FAISS do the retrieval — however we engineer the chunks in order that FAISS “understands” the place every chunk lives within the doc’s hierarchy.

Here’s a view of the Ingestion pipeline:

Construct a Skeleton Tree

PageIndex’s tree parser doesn’t really want an LLM to construct the structural hierarchy. The heading detection is regex-based — it finds Markdown headers (#, ##, ###) and builds the nesting from indentation ranges. The LLM is barely used to summarize every node.

We name the LLM-free model a Skeleton Tree: similar construction, similar node IDs, similar line boundaries — however no summaries.

# Construct skeleton tree — no LLM, runs in milliseconds
pageindex = PageIndex(doc_path, enable_ai=False)
tree = pageindex.build_structure()  # Pure regex parsing

The skeleton tree and the summarized tree produced for the sooner Vectorless RAG produce similar constructions — similar 137 nodes, similar nesting depths, similar line numbers, similar titles. The one distinction is the lacking abstract area.

Price: $0. Time: < 1 second.

Structural Metadata Pointers (The Core Differentiator)

That is the center of why PageIndex works so effectively — and the trick we’ll undertake.

In customary vector RAG, a retrieved chunk is the context. No matter 500 phrases FAISS returns, that’s what the LLM sees. If the chunk begins mid-sentence or ends earlier than the important thing knowledge level, the response will miss the intent of the question fully (as illustrated within the earlier part on Vectorless vs Vector RAG comparability).

PageIndex does one thing essentially totally different: the chunk isn’t the context. Every node within the tree is aware of its precise place within the unique doc — its title, its node ID, and crucially, the begin and finish line numbers of the whole part it represents. When retrieval selects a node, PageIndex goes again to the unique Markdown file and slices out the full, contiguous part between these line boundaries.

We replicate this precisely. Each chunk we embed into the vector index carries wealthy structural metadata from the tree node:

metadata = {
    "doc_id": "SADU",           # Which doc
    "node_id": "0012",          # Which structural node
    "title": "Introduction",    # Part heading
    "start_line": 624,          # The place the part begins within the unique file
    "end_line": 672             # The place the part ends
}

At retrieval time, we don’t feed the matched chunks to the LLM. As an alternative, we:

Use the chunks as proxies — they’re solely there to establish which nodes are related. Take away duplicate (doc_id, node_id) combos to get distinctive top-k.
Comply with the metadata pointers — open the unique Markdown, slice strains of nodes, e.g. 624 to 672
Ship the total sections — the LLM receives the entire, pristine, structurally-intact textual content

Here’s a view of the retrieval pipeline:

This implies even when a bit solely matched on a single sentence deep inside a piece, the synthesis LLM will get the total part — with its header, its context, its figures, its conclusions. The chunk was disposable; the pointer is what issues.

Because of this I name it Proxy-Pointer RAG: the vectors are proxies for location, the metadata are tips that could the actual content material.

Price: $0. Impression: Transforms context high quality from fragmented chunks to finish doc sections.

Breadcrumb Injection (Structural Context)

That is key to reply the queries associated to particular part of the doc (equivalent to Chapter 2). Normal vector RAG embeds uncooked textual content:

"Whereas non-public funding development has slowed in each South Asia and different EMDEs..."

FAISS has no concept this chunk comes from Chapter 1, beneath Financial Exercise, inside Field 1.1. So when person asks “principal messages of Chapter 1,” this chunk gained’t rank extremely — it doesn’t comprise the phrases “Chapter 1” or “principal messages.”

Breadcrumb injection prepends the total ancestry path from the Skeleton Tree to each chunk earlier than embedding:

"[Chapter 1. Deceptive Strength > Economic activity > Regional developments > BOX 1.1 Accelerating Private Investment]
Whereas non-public funding development has slowed in each South Asia and different EMDEs..."

Now the embedding vector encodes each the content material AND its structural location. When somebody asks about “Chapter 1,” FAISS is aware of which chunks belong to Chapter 1 — as a result of the phrases “Chapter 1. Misleading Power” are current within the embedding.

# Construct breadcrumb from ancestry
current_crumb = f"{parent_breadcrumb} > {node_title}"

# Prepend to chunk textual content earlier than embedding
enriched_text = f"[{current_crumb}]n{section_text}"
chunks = text_splitter.split_text(enriched_text)

It is a zero-cost encoding of the tree construction into the vector area. We’re utilizing the identical embeddings API, the identical FAISS index, the identical retrieval code. The one distinction is what we feed into the embedder.

Price: $0 further. Impression: Transforms retrieval high quality for structural queries.

Construction Guided Chunking (No Blind Sliding Home windows)

Normal vector RAG applies a sliding window throughout the total doc — a 2000-character window that strikes ahead with some overlap, fully oblivious to the doc’s construction. A piece may begin mid-paragraph within the Introduction and finish mid-sentence in a Determine caption. The boundaries are arbitrary, and each chunk is an island in itself, with no data of its place within the general doc construction.

Proxy-Pointer does one thing essentially totally different: we stroll the tree, not the textual content.

For every node within the skeleton tree, we extract solely its personal part textual content — from start_line to end_line — after which apply the textual content splitter to that remoted part. If a piece is brief sufficient, it turns into a single chunk. If it’s longer, the splitter divides it — however strictly inside that part’s boundaries.

Normal RAG:  Blind sliding window throughout total doc
[====chunk1====][====chunk2====][====chunk3====]...
    ↑ may begin in Introduction, finish in Determine caption

Proxy-Pointer: Chunk inside every node's boundaries
Introduction (strains 624-672)     → [chunk A] [chunk B]
Financial Exercise (strains 672-676) → [chunk C]
BOX 1.1 (strains 746-749)          → skipped (< 100 chars)
Inflation (strains 938-941)        → [chunk D]

This ensures three issues:

Chunks by no means cross part boundaries — a bit from Introduction won’t ever overlap with Financial Exercise
Every chunk belongs to precisely one node — so the node_id metadata is at all times exact
Breadcrumbs are correct per-chunk — they replicate the precise structural container, not a guess

Importantly, when a node is skipped (as a result of its textual content is just too quick — e.g., a “BOX 1.1” heading with no physique content material), the tree stroll nonetheless recurses into its kids. The precise content material lives in little one nodes like “Introduction,” “Options,” and “Figures” — all of which get embedded with the mum or dad’s title of their breadcrumb (eg: BOX 1.1 Accelerating Non-public Funding > Introduction, BOX 1.1 > Options of...). No content material is ever misplaced; solely empty structural headers are excluded.

Price: $0. Impression: Each chunk is structurally traceable — enabling exact metadata pointers.

Noise Filtering (Eradicating Distractions)

Some sections in any doc are irrelevant for retrievals and contaminate the context: they comprise key phrases that match virtually each question however present no helpful content material.

Desk of Contents — mentions each chapter title (matches “Chapter 1,” “Chapter 2,” all the pieces)
Govt Abstract — paraphrases each key discovering (matches each subject question)
Abbreviations — lists each acronym used within the doc
Acknowledgments — mentions organizations, nations, and themes

These sections act as distractions in vector area. They sit close to the centroid of the doc’s embedding distribution, pulling retrieval towards shallow overview textual content as an alternative of deep analytical content material.

We apply a easy title-based filter throughout indexing:

NOISE_TITLES = {
    "contents", "abstract of contents", "desk of contents",
    "abbreviations", "acknowledgments", "foreword",
    "govt abstract", "references"
}

if title.strip().decrease() in NOISE_TITLES:
    proceed  # Skip this node fully

This filter eliminated 7 nodes from our take a look at doc. The affect was speedy: for the question “What are the primary messages of Chapter 1?”, the system stopped retrieving the Govt Abstract and Desk of Contents and began retrieving the precise Introduction part.

Price: $0. Impression: Eliminates the commonest retrieval failure mode.

That’s it. Proxy-Pointer RAG is now prepared to check.

Take a look at Proxy-Pointer pipeline

Whereas detailed comparability is talked about within the subsequent part, listed here are the outcomes of the identical two queries that have been cited within the “Vectorless vs Flat Vector RAG” part. As you may evaluate, the solutions are as correct and detailed as Vectorless RAG utilizing PageIndex.

Question about desk content material

=== PROXY-POINTER RAG INTERACTION ===
Query: what does annexure desk a2.1.1 say
Distinctive Context Nodes Extracted:
  - [SADU] Node 0131: 'ANNEX TABLE A2.1.1 Research included within the literature overview' | Traces: 3952-4157
  - [SADU] Node 0092: 'ANNEX TABLE SL.B1.1.1 Marginal chance of research documenting above-average impact for poor-average households' | Traces: 2537-2745
  - [SADU] Node 0052: 'TABLE B1.1.1 Correlates of the chance of personal funding accelerations beginning' | Traces: 1663-1899

Response:
Based mostly on the offered paperwork, **Annex Desk A2.1.1** is titled **"Research included within the literature overview."**

This desk supplies a abstract of varied analysis research, detailing the next data for every:
*   **Paper:** The writer(s) and yr of the research (e.g., Kapsos (2005), Saget (2000), Papola and Sahu (2012), and many others.).
*   **Interval of principal pattern:** The timeframe coated by the research (starting from the Nineteen Sixties to 2019).
*   **Pattern protection:** The size or variety of entities analyzed (e.g., "160 nations," "90 growing nations," "47 African nations").
*   **Area:** The geographic scope of the analysis (e.g., Worldwide, South Asia, Africa, Europe, Center East, Asia).
*   **Economies:** The precise nations or financial groupings included within the analysis (e.g., "India," "Bulgaria, Czech Republic, Estonia...", "G20 nations").

The desk paperwork a various set of analysis spanning from 1960 via 2019, masking areas and economies throughout the globe, together with particular knowledge on transition economies, numerous OECD nations, and particular areas equivalent to South Asia and Africa.

Question requiring Structural data

=== PROXY-POINTER RAG INTERACTION ===
Query: what are the questions answered by chapter 2
Distinctive Context Nodes Extracted:
  - [SADU] Node 0099: 'FIGURE 2.3 Progress fashions' | Traces: 3127-3155
  - [SADU] Node 0098: 'Questions' | Traces: 3115-3127
  - [SADU] Node 0124: 'ANNEX 2.1 Abstract of literature on the expansion elasticity of employment' | Traces: 3807-3812

Response:
Based mostly on the offered textual content, Chapter 2 addresses the next questions relating to South Asia’s financial development and labor market:

*   **How does South Asia’s employment evaluate with that of different rising market and growing economies (EMDEs)?** (Supply: *Chapter 2, Questions part*)
*   **What are the correlates of long-run employment ratios amongst EMDEs?** (Supply: *Chapter 2, Questions part*)
*   **How can South Asia create the roles wanted to soak up its rising working-age inhabitants, whereas additionally boosting productiveness?** (Supply: *Chapter 2, Questions part*)

Moreover, the chapter explores these points with the objective of understanding how the area can harness its working-age inhabitants to speed up convergence towards the revenue ranges of superior economies.

Whereas we have now demonstrated the price benefit thus far, does the structure overcome the scalability problem of PageIndex?

Is Proxy-Pointer Scalable?

Right here is why the structure is scalable throughout an enterprise data base. PageIndex pays a scalability penalty at each ends: ~137 LLM calls per doc throughout indexing, and an extra LLM reasoning step per question for tree navigation. Proxy-Pointer eliminates each.

No LLM at indexing. The skeleton tree is regex-built in milliseconds. The one API calls are to the embedding mannequin — similar to straightforward vector RAG.
No tree navigation at retrieval. Queries go straight to the vector index. No LLM studying summaries, no per-document traversal.

Proxy-Pointer is customary vector RAG with clever metadata baked in. The structural consciousness lives contained in the embeddings (through breadcrumbs) and the chunk metadata (through node pointers) — not in an LLM reasoning loop. It inherits all of vector RAG’s scalability: unified multi-document indexes, sub-linear search, incremental updates, and 0 per-query LLM overhead past the ultimate synthesis.

Fail-safe for unstructured paperwork: If a doc has no headings — or the skeleton tree produces solely a single root node — the system detects this throughout chunking and falls again to a normal sliding window. Chunks are flagged with empty node_id and line boundaries. At retrieval time, flagged chunks are used straight as LLM context as an alternative of following pointers again to the supply. The system gracefully degrades to straightforward vector RAG — no errors, no particular dealing with required.

Let’s evaluate Vectorless RAG and Proxy-Pointer head-to-head.

Vectorless vs Proxy-Pointer RAG

I ran a wide range of queries — broad structural, cross-reference, particular factual, determine particular and many others. And let Claude choose the responses for a complete comparability. You could find the detailed responses from Vectorless and Proxy-Pointer together with the total High quality Comparability report right here.

The next desk encapsulates the decision. The Closing Rating: PageIndex 2 — Proxy 4 — Ties 4 . In different phrases, Proxy-Pointer matches or beats PageIndex on 8 out of 10 queries. And all on the scalability and value of a Flat Vector RAG.

Right here is the abstract verdict:

#	Question Sort	Winner
1	Broad structural (Ch.1 messages)	🔴 PageIndex
2	Broad structural (Ch.2 messages)	🔴 PageIndex (slim)
3	Particular factual (Field 1.1 options)	🟡 Tie
4	Cross-reference (inflation tables)	🟢 Proxy-Pointer
5	Comparative (India vs area)	🟢 Proxy-Pointer
6	Determine-specific (B1.1.1 traits)	🟢 Proxy-Pointer
7	Direct lookup (Annexure A2.1.1)	🟡 Tie
8	Entity-specific (forex disaster nations)	🟡 Tie
9	Navigational (Ch.2 questions)	🟡 Tie
10	Inferential/coverage (govt vs shocks)	🟢 Proxy-Pointer

And right here is the price comparability:

Metric	PageIndex	Proxy-Pointer	Normal Vector RAG
Indexing LLM calls	~137 per doc	0	0
Indexing time	5-10 min/doc	< 30 sec/doc	< 30 sec/doc
Retrieval high quality	★★★★★	★★★★★ (8/10 vs PageIndex)	★★★☆☆
Multi-doc scalability	Poor (per-doc tree nav)	Glorious (unified vector index)	Glorious
Structural consciousness	Full (LLM-navigated)	Excessive (breadcrumb-encoded)	None
Index rebuild on replace	Costly (re-summarize)	Low cost (re-embed affected nodes)	Low cost
Explainability	Excessive (part titles + doc IDs)	Excessive (part titles + doc IDs)	Low (opaque chunks)

Key Takeaways

Construction is the lacking ingredient in RAG. The standard hole between naive vector RAG and PageIndex isn’t about higher embeddings — it’s about preserving hierarchy.
You don’t want an LLM to encode construction. Breadcrumb injection and structural metadata give the vector index, structural consciousness with none value.
Noise filtering beats higher embeddings. Eradicating 7 low-value nodes from the index had extra affect on retrieval high quality than any mannequin swap might.
Pointers beat chunks. Chunks act as proxies for the total part, which is what the synthesizer LLM sees.

Conclusion

Proxy-Pointer RAG proves a easy thesis: you don’t want an costly LLM to make a retriever structurally conscious — you simply should be intelligent about what you embed.

5 zero-cost engineering methods — skeleton timber, metadata pointers, breadcrumbs, structure-guided chunking, and noise filtering — shut the standard hole with a full LLM-navigated system, whereas maintaining the velocity and scalability of normal vector RAG. On our 10-query benchmark, Proxy-Pointer matched or beat PageIndex on 8 out of 10 queries, at the price of a normal Vector RAG.

The subsequent time you’re constructing RAG to your structured (or unstructured) doc repository, don’t attain for an even bigger mannequin. Attain for Proxy-Pointer index.

Join with me and share your feedback at www.linkedin.com/in/partha-sarkar-lets-talk-AI

Reference

World Financial institution. 2024. South Asia Improvement Replace, April 2024: Jobs for Resilience — License: CC BY 3.0 IGO.

_{Pictures used on this article are generated utilizing Google Gemini. Code created by me.}

Proxy-Pointer RAG: Reaching Vectorless Accuracy at Vector RAG Scale and Price

Rocket Shut transforms mortgage doc processing with Amazon Bedrock and Amazon Textract

Leave a Reply Cancel reply

Popular News

Greatest practices for Amazon SageMaker HyperPod activity governance

How Cursor Really Indexes Your Codebase

Construct a serverless audio summarization resolution with Amazon Bedrock and Whisper

Speed up edge AI improvement with SiMa.ai Edgematic with a seamless AWS integration

Unlocking Japanese LLMs with AWS Trainium: Innovators Showcase from the AWS LLM Growth Assist Program

About Us

Category

Recent Posts