text-splitter.md

Text Splitter Customizations

Updating model name
Adjusting Chunk Size and Overlap
Using a Custom Text Splitter
Build and start the container

Updating the Model Name

The default text splitter is a SentenceTransformersTokenTextSplitter instance. The text splitter uses a pre-trained model from Hugging Face to identify sentence boundaries. You can change the model used by setting the APP_TEXTSPLITTER_MODELNAME environment variable in the chain-server service of your docker-compose.yaml file like the following example:

services:
  chain-server:
    environment:
      APP_TEXTSPLITTER_MODELNAME: intfloat/e5-large-v2

Adjusting Chunk Size and Overlap

The text splitter divides documents into smaller chunks for processing. You can control the chunk size and overlap using environment variables in chain-server service of your docker-compose.yaml file:

APP_TEXTSPLITTER_CHUNKSIZE: Sets the maximum number of tokens allowed in each chunk.
APP_TEXTSPLITTER_CHUNKOVERLAP: Defines the number of tokens that overlap between consecutive chunks.

services:
  chain-server:
    environment:
      APP_TEXTSPLITTER_CHUNKSIZE: 256
      APP_TEXTSPLITTER_CHUNKOVERLAP: 128

Using a Custom Text Splitter

While the default text splitter works well, you can also implement a custom splitter for specific needs.

Modify the get_text_splitter method in RAG/src/chain_server/utils.py. Update it to incorporate your custom text splitter class.

def get_text_splitter():

   from langchain.text_splitter import RecursiveCharacterTextSplitter

   return RecursiveCharacterTextSplitter(
       chunk_size=get_config().text_splitter.chunk_size - 2,
       chunk_overlap=get_config().text_splitter.chunk_overlap
   )

Make sure the chunks created by the function have a smaller number of tokens than the context length of the embedding model.

Build and Start the Container

After you change the get_text_splitter function, build and start the container.

Navigate to the example directory.
```
cd RAG/examples/basic_rag/llamaindex
```
Build and deploy the microservice.
```
docker compose up -d --build
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand file tree

Text Splitter Customizations

Updating the Model Name

Adjusting Chunk Size and Overlap

Using a Custom Text Splitter

Build and Start the Container

Search code, repositories, users, issues, pull requests...

FilesExpand file tree

text-splitter.md

Latest commit

History

text-splitter.md

File metadata and controls

Text Splitter Customizations

Updating the Model Name

Adjusting Chunk Size and Overlap

Using a Custom Text Splitter

Build and Start the Container

Expand file tree