We at Docugami love the recent energy around doing formal evaluation of Retrieval Augmented Generation (RAG) systems. We especially appreciated the release of Llama Datasets and LangChain Benchmarks which are amazing community efforts to objectively measure different RAG techniques on standardized metrics.
However, we noticed that existing eval datasets were not adequately reflecting RAG use cases that we see in Docugami's production deployments. Specifically, they were doing Q&A over a single (or just a few) docs when in reality customers often need to RAG over larger sets of documents.
To address these concerns, we are releasing Docugami KG-RAG Datasets, an MIT licensed repository of documents and annotated question-answer pairs which reflect real-life customer usage. These datasets are simultaneously published to GitHub and the LlamaHub.
Specifically, these datasets improve upon existing datasets in the following ways:
What did Microsoft report as its net cash from operating activities in the Q3 2022 10-Q?
For Amazon's Q1 2023, how does the share repurchase information in the financial statements correlate with the equity section in the management discussion?
How has Apple's revenue from iPhone sales fluctuated across quarters?
We have been running standard LangSmith evaluations on this dataset using the Docugami KG-RAG Template for LangChain, and comparing our results to other systems like OpenAI’s GPTs and Assistants API.
OpenAI Assistants |
Docugami KG-RAG |
|
Answer Correctness - SEC 10-Q Dataset | 33% | 48% |
It is important to note that these are unassisted results. Docugami is designed so business users can provide point-and-click feedback to the model. With just a few minutes of feedback, Docugami's results approach 100% accuracy. We will provide examples of this with more detail in the next few days.
We have shared the results of the evaluation run here in LangSmith: SEC 10Q Filings 2023-12-11.
We invite community feedback and contributions to these datasets, and will be updating them over time.
Tag us @docugami to share your results, or just reach out at https://www.docugami.com/contact-us.