Users are looking for better chunking strategies specifically for complex legal documents, as current methods fail to maintain context. This could enhance the performance of RAG setups.
Hey everyone, I was recently studying IT Law and realized standard Vector DB RAG setups completely lose context on complex legal documents. They fetch similar text but miss logical conditions like "A violation of Article 5 triggers Article 18." To solve this, I built an end-to-end GraphRAG pipeline. Instead of just chunking and embedding, I use Llama-3 (via Groq for speed) to extract entities and relationships (e.g., Clause -> CONFLICTS\_WITH -> Clause) and store them in Neo4j. **The Stack:** FastAPI + Neo4j + Llama-3 + Next.js (Dockerized on a VPS) **My issue/question:** \> Legal text is dense. Currently, I'm doing semantic chunking before passing it to the LLM for relationship extraction. Has anyone found a better chunking strategy specifically for feeding legal/dense data into a Knowledge Graph? *(For context on how the queries work, I open-sourced the whole thing here:* [`github.com/leventtcaan/graphrag-contract-ai`](http://github.com/leventtcaan/graphrag-contract-ai) *and there is a live demo in my linkedin post, if you want to try it my LinkedIn is* [*https://www.linkedin.com/in/leventcanceylan/*](https://www.linkedin.com/in/leventcanceylan/) *I will be so happy to contact with you:))*