VectifyAI Launches Mafin 2.5 and PageIndex: Achieving 98.7% Financial RAG Accuracy with a New Open-Source Vectorless Tree Indexing.

Building a Retrieval-Augmented Generation (RAG) pipeline is easy; building one that doesn’t hallucinate during a 10-K audit is nearly impossible. For devs in the financial sector, the ‘standard’ vector-based RAG approach—chunking text and hoping for the best—often results in a ‘text soup’ that loses the vital structural context of tables and balance sheets.

VectifyAI is attempting to close this gap with the launch of Mafin 2.5, a multimodal financial agent, and PageIndex, an open-source framework that shifts the industry toward ‘Vectorless RAG.’

The Problem: Why Vector RAG Fails Finance

Traditional RAG relies on semantic similarity. If you ask about ‘Net Income,’ a vector database looks for chunks of text that sound like net income. However, financial documents are layout-dependent. A number in a cell is meaningless without its header, and those headers are often stripped away during traditional PDF-to-text conversion.

This is the ‘garbage in, garbage out’ trap: even the smartest LLM cannot reason correctly if the input data has lost its hierarchical structure.

Mafin 2.5: Accuracy at Scale

Mafin 2.5 isn’t just a fine-tuned model; it’s a reasoning engine that achieved 98.7% accuracy on FinanceBench, significantly outperforming GPT-4o and Perplexity in financial retrieval tasks.

What sets it apart for devs is its native integration with high-fidelity data sources:

Comprehensive SEC Access: Direct indexing of 10-K, 10-Q, and 8-K filings.
Earnings Intel: Real-time and historical earnings call transcripts.
Market Data: Live tickers across the Russell 3000 and Nasdaq.

PageIndex: The Move to ‘Vectorless’ RAG

The ‘secret sauce’ behind Mafin 2.5’s precision is PageIndex. PageIndex replaces traditional flat embeddings with a hierarchical tree index.

Instead of searching through random chunks, PageIndex allows an LLM to ‘reason’ through a document’s structure. It builds a semantic tree—essentially an intelligent map of the document—enabling the agent to identify the exact section, page, and line item required.

Key technical features include:

Vision-Native Support: PageIndex supports Vision-based RAG, allowing models to ‘see’ the global layout of a page (charts, complex grids) rather than relying solely on OCR text.
Hierarchical Navigation: It transforms PDFs into a navigable tree structure, ensuring the relationship between headers and data remains intact.
Traceability: Unlike the ‘black box’ of vector similarity, every answer has a clear path through the document tree, providing a much-needed audit trail for regulated financial environments.

Key Takeaways

Unprecedented Financial Accuracy (98.7%): Mafin 2.5 has set a new state-of-the-art record on the FinanceBench benchmark, achieving 98.7% accuracy. This significantly outperforms general-purpose models like GPT-4o (~31%) and Perplexity (~45%) by focusing on specialized financial reasoning rather than general retrieval.
The Shift to ‘Vectorless RAG’: Moving away from the “vibe-based” search of traditional vector databases, PageIndex introduces Reasoning-based RAG. It uses an LLM to ‘reason’ its way through a document’s structure, mimicking how a human analyst navigates a report to find specific data points.
Hierarchical ‘Tree’ Indexing vs. Chunking: Instead of chopping documents into arbitrary, contextless text chunks, PageIndex organizes PDFs into a semantic tree structure (an intelligent Table of Contents). This preserves the critical relationship between headers, nested tables, and footnotes that traditional RAG often destroys.
Vision-Native & OCR-Free Workflows: The framework supports Vision-based Vectorless RAG, allowing the AI to ‘see’ and retrieve information directly from page images. This is a game-changer for financial documents where the visual layout of a balance sheet or complex grid is as important as the numbers themselves.
Enterprise-Grade Traceability: Unlike the ‘black box’ of vector similarity, PageIndex provides a fully auditable reasoning path. Every response is linked to specific nodes, pages, and sections, providing the transparency required for high-stakes financial audits and compliance.

Check out the Technical details and Repo. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.

Source link