GraphRAG: When Vectors Meet Relationships

Ever get frustrated with RAG systems that miss obvious connections between facts? I know I have. Vanilla RAG is amazing for retrieving relevant chunks of text, but it often misses how entities in your data relate to each other. I've been experimenting with GraphRag lately and incidentally came across an article from Qdrant on this topic, which provided a lot of what I needed to start experimenting: Build a GraphRAG Agent with Neo4j and Qdrant

What I wanted to try out was to transform the project into a completely open source solution. Since Qdrant and Neo4j already offer that option, all I needed to do was to replace the OpenAI API with Qwen2.5:3B and Nomic text embeddings. After a few hours, I think it turned out alright!

What's GraphRAG All About?

GraphRAG is a knowledge graph-based Retrieval-Augmented Generation system that takes a different approach to the typical RAG workflow. Instead of just chunking documents and embedding them, it actually extracts entities and relationships from your text, creating a rich knowledge graph alongside vector embeddings.

The cool thing is how it combines these approaches at query time:

It uses vector search to find semantically relevant passages
It identifies key entities from those passages
It pulls in the connected relationships from the knowledge graph
It packages everything up as rich context for the LLM

The result? Answers that understand both the content AND the connections between entities. Pretty neat, right?

What Makes It Special?

What's in it:

Provider Agnostic: Works with both OpenAI and Ollama, so you can use commercial APIs or run everything locally
Neo4j + Qdrant: Uses Neo4j for the graph database and Qdrant for vector search
Parallel Processing: Breaks down large texts and processes them concurrently for speed
Simple Console Interface: Easy-to-use interface for ingesting data and asking questions

But the real magic happens in how it combines these components. When you ask a question, GraphRAG doesn't just return the most similar text chunks - it actually constructs a subgraph of related entities and their relationships, giving the LLM a much richer context to work with.

Setting It Up

Want to try it yourself? Here's the quick setup:

Clone the repo:

git clone git@github.com:athrael-soju/Qdrant-Neo4j-Ollama-Graph-Rag.git
cd graph-rag

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Start Neo4j and Qdrant with Docker:

docker-compose up -d

Configure your environment:

cp .env.example .env

Then edit the .env file with your preferred settings (OpenAI API key or Ollama connection).

Run the interactive console:

python main.py

And that's it! You can now ingest your data and start asking questions.

Exploring Your Data

One of the coolest aspects of this setup is that you can directly visualize both your vector store and knowledge graph through their respective web interfaces:

Accessing the Neo4j Browser

Neo4j's browser gives you a powerful visual interface to explore your knowledge graph:

Open your browser and navigate to http://localhost:7474
Login with the default credentials:
- Username: neo4j
- Password: morpheus4j (or whatever you set in your docker-compose)
Once logged in, you can run Cypher queries to explore your graph:

MATCH (n)-[r]-(m) RETURN n, r, m LIMIT 25

This will show you a visualization of entities and their relationships. You can click on nodes to expand them, drag them around, and get a feel for how your knowledge is interconnected.

Exploring Qdrant Dashboard

Qdrant provides a clean interface for examining your vector store:

Open your browser and navigate to http://localhost:6333/dashboard
No login required - you'll immediately see the dashboard
From here, you can:
- View your collections
- Explore points and their payloads
- Try out search queries
- Monitor collection metrics

This is particularly useful for debugging and understanding how your semantic search is working, as well as visualizing the vector space.

By exploring both dashboards, you get a complete picture of how GraphRAG stores and retrieves information - seeing both the semantic relationships (vectors) and explicit relationships (graph) that power your answers.

See It In Action

Let me show you a quick example. I fed the system some made-up data about wildlife conservation efforts (you can find it in sample_data.txt). After ingestion, I asked about one of the characters:

Enter your choice (1-5): 3

Ask a Question
Enter your question: What projects is Olivia involved in?

Starting retriever search...
Extracting entity IDs...
Fetching related graph...
Formatting graph context...
Running GraphRAG...

Answer: Based on the knowledge graph, Olivia is involved in several projects:

1. She's a wildlife biologist at Wildlife Guardians Nairobi field office
2. She documented endangered species in the Maasai Mara
3. She led an expedition to document migratory patterns in East Africa
4. She co-led a biodiversity survey in the Amazon with Ethan
5. She's developed conservation techniques that have helped reduce human-wildlife conflicts

Olivia also collaborates with various individuals:
- She previously collaborated with Lucas at Green Earth Initiative
- She is mentored by Mia on conservation strategies
- She visits Nairobi monthly for strategic planning meetings

Her research findings have been published in several scientific journals, and her dedicated fieldwork has earned her international recognition.

Query processing time: 3.21 seconds

Notice how the answer combines information from across the dataset, understanding connections between Olivia and other people/projects. That's the graph magic at work!

What's Next?

I'm currently experimenting with the use of advanced NLP libraries and models to improve the entity extraction phase and make it more consistent and accurate. I'm also looking into ways to optimize the graph search and retrieval process, as well as exploring more sophisticated ways to combine vector and graph information for richer context.

That's it!

Hacking this app has been a fun exploration of how we can push beyond standard RAG architectures. The combination of vector search and graph databases seems particularly powerful for domains with complex entity relationships - think enterprise knowledge bases, scientific research, or financial analysis.

Is it perfect? Not at all. There's still plenty of room for improvement, but the code is open-source, so feel free to check it out, suggest improvements, or contribute if you're interested!

Want to chat about GraphRAG or have ideas for improvement? Reach out to me on GitHub or drop me an email. I'd love to hear from you!

And off-course, huge credit to Qdrant, the original authors of the article that inspired this project, and the amazing folks behind Neo4j and Ollama for their fantastic tools.

Until next time, happy coding!