The RAG documentation is divided into the following sections:
This section will help you get started quickly with the sample RAG example.
- Installation guide: This guide walks you through the process of setting up your environment and utilizing the
- Getting Started guides: A series of quick start steps that will help you to understand the core concepts and start the pipeline quickly. These guides include Jupyter notebooks that you can experiment with.
The user guides cover the core details of the provided example and how to configure and use different features to make your own chains.
- LLM Inference Server: Learn about the service which accelerates LLM inference time using TRT-LLM.
- Integration with Nvidia AI Playground: Understand how to access NVIDIA AI Playground on NGC which allows developers to experience state of the art LLMs accelerated on NVIDIA DGX Cloud with NVIDIA TensorRT nd Triton Inference Server.
- Configuration Guide: The complete guide to all the configuration options available in the
config.yamlfile. - Frontend: Learn more about the sample playground provided as part of the workflow.
- Chat Server Guide: Learn about the chat server which exposes core API's for end user.
- Jupyter Server Guide: Learn about the different notebooks available and the server which can be used to access them.
This guide sheds more light on the infrastructure details and the execution flow for a query when the runtime is used:
- Architecture: Understand the architecture of the sample RAG workflow.
The sample RAG worlflow provides a set of evaluation pipelines via notebooks which developers can use for benchmarking. There are also detailed guides on how to reproduce results and create datasets for the evaluation.
- RAG Evaluation: Understand the different notebooks available.