Using GraphRAG with existing Chunks #920

Aug 13, 2024

gianpycea
Aug 13, 2024

Hello. I have the following use case: I would like to generate a Knowledge Graph but starting from existing embedded Chunks.
I have been reading on how the indexing process is structured, essentially for my use case is I want to skip the step where we start from text Documents and we create Text Units.

Does the config allows for the population of TextUnit or how one would adapt the code to achieve this? Thank you in advance for all the help you can give me.

Aug 13, 2024

marcusze
Aug 13, 2024

This would be super really neat as you can combine it with existing vector databases.

0 replies

gianpycea · Aug 13, 2024

natoverse
Aug 13, 2024
Maintainer

Please see the response here: #396

3 replies

gianpycea Aug 14, 2024
Author

hey, thank you so much for your reply and yes that is essentially the same question. If I am really honest though I still don't get in practice how one would code the script to bring in own chunks (i am going to assume for the moment that i don't care to bring the embedding of chunk content and happy to use whatever graphrag uses).

I have been following the documentation and i can use graphrag in its "basic way" where you ran the indexing code starting from a txt document.

If i want to just skip the chunking phase and start from my own chunks what is the best way to do this? do i need to create my own workflow using that language or is there any other way?

I think a bit more of a deep dive in the code base or a minimal script that shows how one would use graphrag for this use case would have been good because it's unclear to me how to achieve this from just the docs.

Thanks in advance!

natoverse Aug 14, 2024
Maintainer

If you want to skip our chunking and start from your own chunks, replace the starting txt document with individual txt documents for each chunk. As long as each of those documents are shorter than your configured GraphRAG chunk_size, our chunking will be skipped and your chunks used directly. The actual process/config for running GraphRAG in this case does not change at all, you have just supplied a different set of input documents.

If you need to create your chunks, I would suggest using tiktoken, which has encode/decode methods to match the encoding of your model. So your script would encode the document into a list of tokens, iterate through the tokens to subdivide them into sublists that are shorter than your chunk_size, and then decode those token lists back to text that you can write to a file.

SS8816 Sep 25, 2025

Hey, Could we pass a .faiss and .pkl embedded versions of out .txt files in the input folder? will that work? or the input needs to be either .txt, .json......?

Aug 16, 2024

jformio
Aug 16, 2024

I have a similar use case - but with a lexical graph in a Neo4j graph database which has been created from multiple documents.

The input step in graphrag only takes in plain text strings (at least in the documentation I have found, including the csv example), would be cool to learn how to run graphrag on an exisiting lexical property graph.

In my case - my graph contains both section & table-row-data nodes with properties such as an id and text. To run the index engine on each node and let it extract text units, entities, communities etc would be MEGA. For small token nodes (like table-row-data nodes) the relation between a node and text unit would be 1:1, whereas with a larger section node (containing largers pieces of text) it would be 1:many.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using GraphRAG with existing Chunks #920

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments · 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Search code, repositories, users, issues, pull requests...

Using GraphRAG with existing Chunks #920

Uh oh!

gianpycea Aug 13, 2024

Replies: 3 comments · 3 replies

Uh oh!

marcusze Aug 13, 2024

Uh oh!

natoverse Aug 13, 2024 Maintainer

Uh oh!

gianpycea Aug 14, 2024 Author

Uh oh!

natoverse Aug 14, 2024 Maintainer

Uh oh!

SS8816 Sep 25, 2025

Uh oh!

jformio Aug 16, 2024

gianpycea
Aug 13, 2024

marcusze
Aug 13, 2024

natoverse
Aug 13, 2024
Maintainer

gianpycea Aug 14, 2024
Author

natoverse Aug 14, 2024
Maintainer

jformio
Aug 16, 2024