index is too slow #1971

Jun 10, 2025

fenglex
Jun 10, 2025

The process of running the index took too long. I used 19 articles, and it still hadn't completed after 10 hours, using Alibaba Cloud's API (deepseek-r1).

Jun 17, 2025

The-DarkMatter
Jun 17, 2025

I've noticed the same thing. It is taking more than 72 hours for a 1.6MB text file.

For reference, this is my config.yaml

The prompts are default ones with entity prompt tweaked a little bit. But token size remains the same

### This config file contains required core defaults that must be set, along with a handful of common optional settings.
### For a full list of available settings, see https://microsoft.github.io/graphrag/config/yaml/

### LLM settings ###
## There are a number of settings to tune the threading and token limits for LLM calls - check the docs.

models:
  default_chat_model:
    type: openai_chat # or azure_openai_chat
    # api_base: https://<instance>.openai.azure.com
    # api_version: 2024-05-01-preview
    auth_type: api_key # or azure_managed_identity
    api_base: ${OPENAI_BASE_URL}
    api_key: ${GRAPHRAG_API_KEY} # set this in the generated .env file
    # audience: "https://cognitiveservices.azure.com/.default"
    # organization: <organization_id>
    model: ${GRAPHRAG_CHAT_MODEL}
    # deployment_name: <azure_model_deployment_name>
    # encoding_model: cl100k_base # automatically set by tiktoken if left undefined
    model_supports_json: false # recommended if this is available for your model.
    concurrent_requests: 25 # max number of simultaneous LLM requests allowed
    async_mode: threaded # or asyncio
    retry_strategy: native
    max_retries: 10
    tokens_per_minute: auto              # set to null to disable rate limiting
    requests_per_minute: auto            # set to null to disable rate limiting
  default_embedding_model:
    type: openai_embedding # or azure_openai_embedding
    # api_base: https://<instance>.openai.azure.com
    # api_version: 2024-05-01-preview
    auth_type: api_key # or azure_managed_identity
    api_base: ${OPENAI_BASE_URL}
    api_key: ${GRAPHRAG_API_KEY}
    # audience: "https://cognitiveservices.azure.com/.default"
    # organization: <organization_id>
    model: ${GRAPHRAG_EMBEDDINGS_MODEL}
    # deployment_name: <azure_model_deployment_name>
    # encoding_model: cl100k_base # automatically set by tiktoken if left undefined
    model_supports_json: false # recommended if this is available for your model.
    concurrent_requests: 25 # max number of simultaneous LLM requests allowed
    async_mode: threaded # or asyncio
    retry_strategy: native
    max_retries: 10
    tokens_per_minute: auto              # set to null to disable rate limiting
    requests_per_minute: auto            # set to null to disable rate limiting

### Input settings ###

input:
  type: file # or blob
  file_type: text # [csv, text, json]
  base_dir: "input"

chunks:
  size: 1200
  overlap: 500
  group_by_columns: [id]

### Output/storage settings ###
## If blob storage is specified in the following four sections,
## connection_string and container_name must be provided

output:
  type: file # [file, blob, cosmosdb]
  base_dir: "output"
    
cache:
  type: file # [file, blob, cosmosdb]
  base_dir: "cache"

reporting:
  type: file # [file, blob, cosmosdb]
  base_dir: "logs"

vector_store:
  default_vector_store:
    type: lancedb
    db_uri: output/lancedb
    container_name: default
    overwrite: True

### Workflow settings ###

embed_text:
  model_id: default_embedding_model
  vector_store_id: default_vector_store

extract_graph:
  model_id: default_chat_model
  prompt: "prompts/extract_graph_v3.txt"
  entity_types: [COMPANY, SUBSIDIARY, CURRENCY, FISCAL_YEAR, FISCAL_QUARTER, METRIC_CATEGORY, METRIC_VALUE, PRODUCT/SERVICE, RISK, GEOGRAPHY, PERSON, TITLES_EXTENDED, CORPORATE_ACTIONS, COMPANY_EVENTS]
  max_gleanings: 1

summarize_descriptions:
  model_id: default_chat_model
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

extract_graph_nlp:
  text_analyzer:
    extractor_type: regex_english # [regex_english, syntactic_parser, cfg]

cluster_graph:
  max_cluster_size: 10

extract_claims:
  enabled: false
  model_id: default_chat_model
  prompt: "prompts/extract_claims.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 1

community_reports:
  model_id: default_chat_model
  graph_prompt: "prompts/community_report_graph.txt"
  text_prompt: "prompts/community_report_text.txt"
  max_length: 2000
  max_input_length: 8000

embed_graph:
  enabled: true # if true, will generate node2vec embeddings for nodes

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes (embed_graph must also be enabled)

snapshots:
  graphml: true
  embeddings: true

### Query settings ###
## The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.
## See the config docs: https://microsoft.github.io/graphrag/config/yaml/#query

local_search:
  chat_model_id: default_chat_model
  embedding_model_id: default_embedding_model
  prompt: "prompts/local_search_system_prompt.txt"

global_search:
  chat_model_id: default_chat_model
  map_prompt: "prompts/global_search_map_system_prompt.txt"
  reduce_prompt: "prompts/global_search_reduce_system_prompt.txt"
  knowledge_prompt: "prompts/global_search_knowledge_system_prompt.txt"

drift_search:
  chat_model_id: default_chat_model
  embedding_model_id: default_embedding_model
  prompt: "prompts/drift_search_system_prompt.txt"
  reduce_prompt: "prompts/drift_search_reduce_prompt.txt"

basic_search:
  chat_model_id: default_chat_model
  embedding_model_id: default_embedding_model
  prompt: "prompts/basic_search_system_prompt.txt"

0 replies

The-DarkMatter · Jun 18, 2025

Vishnu8299
Jun 18, 2025

Hi @fenglex,

Thanks for raising this — 10+ hours for indexing 19 articles definitely seems excessive.

A few things to consider:

DeepSeek-R1 via Alibaba Cloud might have rate limits, token quotas, or cold start latency depending on your plan.

If you're using a retrieval-based pipeline (like RAG), the slowness might come from embedding generation or vector indexing (e.g., FAISS, Milvus).

🔧 Suggestions:
Try checking the response time of each individual API call — are the embeddings themselves taking a long time to generate?

See if you can batch the requests (if the API supports it).

Consider running the process with logging enabled to identify the exact bottleneck (embedding, indexing, or disk I/O).

Also, check if the content of the articles is unusually large or contains formatting that might slow tokenization.

If you can share your code snippet or indexing setup, I’d be happy to take a closer look!

Best,
Vishnu Vardhan

5 replies

The-DarkMatter Jun 19, 2025

Hey I tried to index the same thing with graphrag 1.2.0 and it took 2 hours. The API doesn't seem to be an issue for me. In logs I can see that the time gap between 2 steps is much more. I don't understand why same steps are quicker in 1.2.0

I tried other versions in 2.x but same slowness is observed

agunay-munichre Jun 20, 2025

Hiya, we have a 200MB file, and it takes 13 hours to index. We use Azure Blob Storage to store files and the Azure Open AI service for LLM integrations.

The-DarkMatter Jun 20, 2025

Hey thanks for setting the benchmark. In our case, we are using this on AWS with APIs from Bedrock. We are converting the API output to OpenAI format using local litellm proxy. We are running these steps in AWS Sagemaker instance. I don't think the infra is a problem at our end. I can share the indexing logs here if you want.

Also please note this as well that I'm getting indexing done in a couple of hours for a 1.5 MB file on version 1.2.0, but same thing wasn't completed even after 4 days in 2.3.0 version.

Any help is much appreciated.

timb26 Jul 16, 2025

I'm seeing the same behaviour with 2.3.0. 116KB text file and the indexing takes > 48 hours. I'm using Llama 3 hosted on-prem, nomic-text hosted on-prem. Thing is that with 1.* it ran in like 1-2 hours. I tried running it locally on 1.* off my MBP with Ollama too and that was also only a couple of hours... Something in 2+ that is causing weird issues.

gona-sreelatha Sep 26, 2025

If it helps anyone, its primarily because requests_per_minute: auto, they are using fnllm before 2.6 version release, auto is hardcoded to 1 rpm in that lib, change it to 200 or 300 and you will see it is faster, Currently it takes only half hour for 1 gb worth of indexing .

 tokens_per_minute: null        
 requests_per_minute: null

Aug 11, 2025

atagunay
Aug 11, 2025

Hi everyone, I’ve implemented several changes that have improved indexing performance by approximately 50%.

Disabled Cache for Per-File Ingestion
Since my process generates a separate graph for each uploaded file—rather than aggregating all uploads into a single graph—there is no benefit to maintaining a shared cache. Disabling the cache eliminates unnecessary calls to the blob service for cache validation, allowing ingestion to start immediately.

settings.yaml

cache:
  type: none # Supported: file, blob, cosmosdb

Direct File Retrieval Instead of Regex Search
In the original Graphrag repository, file discovery relies on iterating through all files and applying a regex filter, which is computationally expensive. As I already know the exact file to ingest, I replaced the regex search with a direct reference. This removes the overhead of scanning unrelated files.

/graphrag/index/input/util.py

# Original implementation:
# files = list(storage.find(re.compile(config.file_pattern), progress=progress, file_filter=config.file_filter))
# Example: config.file_pattern = oid/file-name$

# Optimized implementation:
files = [(config.file_pattern.split("/")[-1].split("$")[0], {})]

Impact:
These adjustments have reduced processing time by roughly half, resulting in a more efficient and responsive indexing workflow.

0 replies

natoverse · Sep 26, 2025

gona-sreelatha
Sep 26, 2025

If it helps anyone, its primarily because requests_per_minute: auto, they are using fnllm before 2.6 version and auto is hardcoded to 1 rpm in that lib, change it to 200 or 300 and you will see it is faster, Currently it takes only half hour for 1 gb worth of indexing .

2 replies

natoverse Sep 26, 2025
Maintainer

This can have a significant effect, as the "auto" setting was resulting in many 429s and thus backoffs for us, causing significant slowdowns. We have changed the default to "null" in later versions. Also note that GraphRAG 2.6.0 supports using LiteLLM instead of fnllm (we are working toward making this a permanent replacement), so you can try experimenting with that.

One other note: depending on your use case, indexing in "fast" mode, which uses NLP for the graph extraction portion, will significantly speed up your indexing. See discussion in docs here.

gona-sreelatha Sep 29, 2025

Thank you for the great library! 🙏 Yes, I had tried FastRAG but decided on StandardRAG given the criticality of quality for our use case. We are also upgrading to 2.6 — it’s a solid release with great features and solutions to many of the challenges we were facing.

Search code, repositories, users, issues, pull requests...

index is too slow #1971

Uh oh!

Replies: 4 comments · 7 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

natoverse Sep 26, 2025 Maintainer

Uh oh!

natoverse Sep 26, 2025
Maintainer