20 May 2026

Why I'm building FinDocs RAG

I spent enough time around financial documents during my Finance MSc to know how much of the work is just finding the right paragraph. Filings are long, the language is dense, and the answer you need is usually buried three documents deep. That search is slow, and it is the kind of slow that compounds across a whole team.

FinDocs RAG is my attempt to make that search conversational. The idea is simple to state: let someone ask a plain-English question and get an answer that is grounded in the actual filings — with the source attached, so it can be checked. Combining that with live Companies House data means the answer reflects what is true now, not just what was in a static document set.

I chose retrieval-augmented generation over fine-tuning deliberately. Financial data changes, and in this domain an answer you cannot trace back to a source is close to useless. RAG keeps the model honest: it can only speak to what it retrieved, and the retrieval is something you can inspect.

This is very much in progress. The vector store choice is still open, the evaluation set does not exist yet, and I am not going to put numbers on this page until I have actually measured them. When there is something real to show, it will go here and on the project page.