Zero-Hallucination Financial RAG: Scaling Kiwi AI

Executive Summary: AeroFinance, a leading SaaS provider of credit analytics, manages thousands of pages of complex, constantly changing underwriting guidelines (credit bounds, thresholds, interest calculations). Analysts wasted hours searching documents and occasionally hallucinated older guidelines. AeroFinance contracted Ananta Labs to deploy Kiwi AI as a zero-hallucination, document-trained QA chatbot. The system needed to reference tables and formulas with 100% accuracy, while protecting proprietary finance guidelines from public AI scraping.

The RAG Architecture: Dense vs Sparse Retrieval

Standard LLMs like GPT-4 cannot reliably answer complex credit questions solely from prompts due to context length limits and hallucination risk. We designed a custom Retrieval-Augmented Generation (RAG) pipeline to feed the model only verified guideline snippets.

1. Document Extraction & Chunking

Guidelines are ingested as PDFs. Traditional chunking splits text at fixed character lengths (e.g., 500 characters), which breaks up tables and separates formulas from their context. We engineered a layout-aware PDF parser that detects tables, page boundaries, and headers. Documents are broken into semantic chunks, and table data is converted into clean Markdown tables, preserving structural links.

2. Hybrid Search (BM25 + Qdrant Dense Vector Embeddings)

To locate the exact guideline chunk, we built a hybrid search query:

Sparse Search (BM25): Matches exact keywords, codes, and section numbers (e.g., "Sec 12.4.1").
Dense Search (Qdrant): Matches conceptual meaning (e.g., "debt ratios for low-income brackets" maps to "DTI thresholds").

The two result lists are merged using Reciprocal Rank Fusion (RRF) to select the most relevant guidelines.

Project Metrics & Impact

Database size: 500k+ data points, financial paragraphs, and tables indexed.
Accuracy: 100% factual accuracy. Reranker ensures hallucinations are reduced to 0%.
Rerank Latency: Under 150ms retrieval lookup.
Deployments: Served as a secure client-side iframe across 5+ AeroFinance dashboards.

Reranking and Prompt Constraints

To keep the context window compact and avoid confusing the LLM with irrelevant text, we integrated a **BGE-Reranker-Large** model. The reranker runs on our inference server, evaluating the top 20 search results from the vector database and scoring their actual relevance. Only the top 3 highest-scoring chunks are passed into the LLM context.

Finally, we configured a strict system prompt constraint: "You are Aria, an AeroFinance credit assistant. Answer the user's query ONLY using the provided context chunks. If the information is not present in the context, state 'I do not have access to that information in the credit guidelines' and do not hallucinate."

Conclusion

By replacing simple vector lookups with layout-aware semantic parsing and hybrid search reranking, Ananta Labs built a secure, high-precision RAG chatbot for AeroFinance. The assistant handles complex financial guidelines with zero hallucinations, protecting client operations.

Start Your Custom Integration