Announcing Docugami Knowledge Graph Retrieval Augmented Generation (KG-RAG) Datasets in the LlamaHub

Written by Taqi Jaffri | December 13, 2023 at 10:19 PM

We at Docugami love the recent energy around doing formal evaluation of Retrieval Augmented Generation (RAG) systems. We especially appreciated the release of Llama Datasets and LangChain Benchmarks which are amazing community efforts to objectively measure different RAG techniques on standardized metrics.

However, we noticed that existing eval datasets were not adequately reflecting RAG use cases that we see in Docugami's production deployments. Specifically, they were doing Q&A over a single (or just a few) docs when in reality customers often need to RAG over larger sets of documents.

To address these concerns, we are releasing Docugami KG-RAG Datasets, an MIT licensed repository of documents and annotated question-answer pairs which reflect real-life customer usage. These datasets are simultaneously published to GitHub and the LlamaHub.

Specifically, these datasets improve upon existing datasets in the following ways:

QnA over multiple documents, more than just a few
Use more realistic long-form documents that are similar to documents customers use, not just standard academic examples
Include questions of varying degree of difficulty, including:
1. Single-Doc, Single-Chunk RAG: Questions where the answer can be found in a contiguous region (text or table chunk) of a single doc. To correctly answer, the RAG system needs to retrieve the correct chunk and pass it to the LLM context. For example: What did Microsoft report as its net cash from operating activities in the Q3 2022 10-Q?
2. Single-Doc, Multi-Chunk RAG: Questions where the answer can be found in multiple non-contiguous regions (text or table chunks) of a single doc. To correctly answer, the RAG system needs to retrieve multiple correct chunks from a single doc which can be challenging for certain types of questions. For example: For Amazon's Q1 2023, how does the share repurchase information in the financial statements correlate with the equity section in the management discussion?
3. Multi-Doc RAG: Questions where the answer can be found in multiple non-contiguous regions (text or table chunks) across multiple docs. To correctly answer, the RAG system needs to retrieve multiple correct chunks from multiple docs. For example: How has Apple's revenue from iPhone sales fluctuated across quarters?

We have been running standard LangSmith evaluations on this dataset using the Docugami KG-RAG Template for LangChain, and comparing our results to other systems like OpenAI’s GPTs and Assistants API.

	OpenAI Assistants	Docugami KG-RAG
Answer Correctness - SEC 10-Q Dataset	33%	48%

It is important to note that these are unassisted results. Docugami is designed so business users can provide point-and-click feedback to the model. With just a few minutes of feedback, Docugami's results approach 100% accuracy. We will provide examples of this with more detail in the next few days.

We have shared the results of the evaluation run here in LangSmith: SEC 10Q Filings 2023-12-11.

We invite community feedback and contributions to these datasets, and will be updating them over time.

Tag us @docugami to share your results, or just reach out at https://www.docugami.com/contact-us.

View full post