Skip to content
Athrael.net logo Athrael.net
Go back

A brief analysis on RAG with Pinecone Serverless and Unstructured.io

Updated:
Edit page

brief-analysis-on-rag-with-pinecone-and-unstructured-io

Welcome to the latest installment of the series on building an AI chat assistant from scratch. This time around, however I’d like to change the format a bit. Instead of guiding you through the process and showing snippets of code, I will instead provide a high-level overview of the process and the tools used and then a brief analysis of the RAG model using Pinecone Serverless and Unstructured.io.

For reference, here are the previous articles in the series:

The model is pretty much a prototype, and it may contain bugs or issues, so please let me know if you find any ground-breaking issues. And as always, if you’d prefer to skip the article and get the code yourself, you can find it on GitHub.

Overview

RAG (Retrieval-Augmented Generation) is a Model commonly used in Generative AI to enhance the quality of an AI response, by providing context to an LLM (Large Language Model) such as GPT-4. The context is retrieved from a Vector Database using Semantic Search and supplements the user message to be sent. As a result, The LLM can generate responses that are not only more accurate and relevant, but also mitigate the infamous “hallucination” phenomenon that is so common in LLMs these days.

Although I’ve been building RAG based open source applications for a while now, I’ve mostly stuck to basic techniques and tools that provided average results and performance. However, I recently came across two new tools that have completely changed the game for me: Pinecone Serverless and Unstructured.io.

Motivations

Pinecone Serverless is a newly released Vector Database, used for Generative AI based applications and leveraging concepts such as RAG, Semantic search and classification. I’ve used Pinecone before with decent results, but with the recent release of a serverless option, I was intrigued to try again.

In past efforts, the primary challenge in utilizing a Vector Database wasn’t about the database itself, the embeddings, or even the earlier versions of GPT-4 models, which were often reluctant to incorporate provided context. The real difficulty lay in parsing unstructured data effectively and devising a solid chunking strategy. This was especially problematic for complex, unstructured documents containing images and tables, leading to decreased precision during the retrieval phase. While this might have been acceptable for small-scale, applications, or personal projects, it certainly would not suffice for a production-grade application.

This is when I started looking for document parsers and came across Unstructured.io, an application used for extracting and analyzing information from unstructured data, including Tables and Images. I was eager to see how it could be integrated in a RAG based model and what kind of precision and performance it could provide.

These two tools also came with a free tier, or free credits, which was a huge plus for me. So I thought, why not build a new RAG model using these two tools and see how it performs?

The Process

I implemented the project using my existing Next.js-based template named Titanium, which already incorporates several advanced Generative AI features.

The RAG process follows these steps:

  1. A document uploaded by the user is parsed using Unstructured.io.
  2. The parsed chunks of the document are embedded with OpenAI’s ada-003 model.
  3. Pinecone Serverless indexes the embedded data within a user-specific namespace, including any additional metadata.
  4. When the user sends a message to the AI Assistant, this context augments the message, which is then processed by the gpt-4-0125-preview model.
  5. GPT-4 generates a response using the enriched message, ensuring relevance and accuracy.

Analysis

I tested this RAG model using 2 types of documents in PDF format; A 15 page sample offer letter and a 88 page Book.

Some of the Parameters that I adjusted and tested included:

Metrics

Parsing

Embedding

Indexing/Upserting

Retrieval

Deletion

Streaming Chat Experience

Costs

Conclusion

Overall, I’ve had a blast building this RAG model using Pinecone Serverless and Unstructured.io. I’m quite pleased with the performance and offcourse being able to develop this model for free. I’m looking forward to further testing and optimizing the model, and I’m excited to see how these tools will evolve in the future.

unstructured.io is perhaps geared to becoming major player in the field of unstructured data parsing. The hi_res parsing strategy’s performance appears to scale linearly with document size, yet the accuracy it delivers is notably high. Combined with a fast and efficient Vector Database like Pinecone Serverless and a powerful LLM like GPT-4, we can confidently build a production-grade RAG model that can handle complex, unstructured documents for really affordable costs.

So, that’s it for now!

I hope this brief analysis has been helpful to you. If you have any questions or suggestions, feel free to reach out to me on GitHub, LinkedIn, or via email.

Oh, and I’m not in any way affiliated with Pinecone or Unstructured.io, I just really like their products and most importantly, their free tiers. Wink, wink!

See ya around and happy coding!


Edit page
Share this post on:

Previous Post
So, I've been doing stuff...
Next Post
Integrating Vision using the latest OpenAI API