Updated 12/12/2023 with new eval using LangSmith: SEC 10Q Filings 2023-12-11 and a new dataset that reflects real-life customer usage patterns.
At Docugami, we have built the world’s most advanced Foundation Model to convert long-form business documents (Scanned PDFs, Digital PDFs, DOCX, DOC) into semantic XML Knowledge Graphs. Real-world documents are more than just flat text, and Docugami is built to handle multi-page long-form documents including complex tables and multi-column flows while producing an XML Knowledge Graph as output that faithfully represents the entire documents, semantically and structurally.
Retrieval Augmented Generation (RAG) has recently gained traction as a popular use case that allows Large Language Models (LLMs) to reason over business-critical data that is often private to enterprises.
RAG over simple text is a start, but RAG over semantic XML Knowledge Graphs (KG-RAG) is a game changer: Today, we are shipping the Docugami KG-RAG Template for LangChain that allows customers to send their Docugami XML Knowledge Graph as input to LLMs. Our preliminary results indicate that Docugami KG-RAG significantly outperforms the retrieval over documents built into OpenAI’s GPTs and Assistants API.
OpenAI Assistants |
Docugami KG-RAG |
|
Answer Correctness - SEC 10-Q Dataset | 33% | 48% |
It is important to note that these are unassisted results. Docugami is designed so business users can provide point-and-click feedback to the model. With just a few minutes of feedback, Docugami's results approach 100% accuracy. We will provide examples of this with more detail in the next few days.
We are sharing the results of the evaluation run here in LangSmith: SEC 10Q Filings 2023-12-11.
As a reminder, over the past year, Docugami has shipped integrations with open-source frameworks like LangChain and LlamaIndex that allow developers to build their own RAG solutions, and OpenAI has also recently announced built-in support for retrieval over documents in GPTs and the Assistants API.
We love all this momentum around RAG, but feel that there are still some gaps that need to be addressed:
We recently released a cookbook that allows any developer to do RAG over XML Knowledge Graphs (KG-RAG). Today, we are excited to go beyond the simple cookbook and release a new end-to-end LangChain template for Docugami KG-RAG. With this template, you can quickly get up and running with KG-RAG in your own applications. You can check out the template here: https://github.com/docugami/langchain-template-docugami-kg-rag.
We have done some preliminary comparisons of Docugami KG-RAG using this new template against OpenAI GPTs under the following conditions:
As mentioned, our preliminary results indicate that OpenAI Assistants are correct approximately 33% of the time, while Docugami KG-RAG is correct approximately 48% of the time. We are sharing the results of the evaluation run here in LangSmith: SEC 10Q Filings 2023-12-11.
We invite community feedback on these preliminary results and will be updating this template over time to add other capabilities, for example multi-modal RAG including figures/images inside documents. We will also be adding more test documents and questions and will be asking the community to contribute.
Tag us @docugami to share your results and experience using our new Docugami KG-RAG template on your documents, or just reach out at https://www.docugami.com/contact-us.