Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
19 changes: 19 additions & 0 deletions 19 .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Ignore git objects
.git/
.gitignore
.gitlab-ci.yml
.gitmodules

# Ignore temperory volumes
deploy/compose/volumes

# creating a docker image
.dockerignore

# Ignore any virtual environment configuration files
.env*
.venv/
env/
# Ignore python bytecode files
*.pyc
__pycache__/
4 changes: 4 additions & 0 deletions 4 .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,7 @@ docs/_*
docs/notebooks
docs/experimental
docs/tools

# Developing examples
RetrievalAugmentedGeneration/examples/simple_rag_api_catalog/
deploy/compose/simple-rag-api-catalog.yaml
14 changes: 14 additions & 0 deletions 14 .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,17 @@ repos:
args:
- --license-filepath
- RetrievalAugmentedGeneration/LICENSE.md
- repo: https://github.com/psf/black
rev: 19.10b0
hooks:
- id: black
args: ["--skip-string-normalization", "--line-length=119"]
additional_dependencies: ['click==8.0.4']
- repo: https://github.com/pycqa/isort
rev: 5.12.0
hooks:
- id: isort
name: isort (python)
args: ["--multi-line=3", "--trailing-comma", "--force-grid-wrap=0", "--use-parenthese", "--line-width=119", "--ws"]


32 changes: 32 additions & 0 deletions 32 CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,38 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).


## [0.7.0] - 2024-06-18

This release switches all examples to use cloud hosted GPU accelerated LLM and embedding models from [Nvidia API Catalog](https://build.nvidia.com) as default. It also deprecates support to deploy on-prem models using NeMo Inference Framework Container and adds support to deploy accelerated generative AI models across the cloud, data center, and workstation using [latest Nvidia NIM-LLM](https://docs.nvidia.com/nim/large-language-models/latest/introduction.html).

### Added
- Added model [auto download and caching support for `nemo-retriever-embedding-microservice` and `nemo-retriever-reranking-microservice`](./deploy/compose/docker-compose-nim-ms.yaml). Updated steps to deploy the services can be found [here](https://nvidia.github.io/GenerativeAIExamples/latest/nim-llms.html).
- [Multimodal RAG Example enhancements](https://nvidia.github.io/GenerativeAIExamples/latest/multimodal-data.html)
- Moved to the [PDF Plumber library](https://pypi.org/project/pdfplumber/) for parsing text and images.
- Added `pgvector` vector DB support.
- Added support to ingest files with .pptx extension
- Improved accuracy of image parsing by using [tesseract-ocr](https://pypi.org/project/tesseract-ocr/)
- Added a [new notebook showcasing RAG usecase using accelerated NIM based on-prem deployed models](./notebooks/08_RAG_Langchain_with_Local_NIM.ipynb)
- Added a [new experimental example](./experimental/rag-developer-chatbot/) showcasing how to create a developer-focused RAG chatbot using RAPIDS cuDF source code and API documentation.
- Added a [new experimental example](./experimental/event-driven-rag-cve-analysis/) demonstrating how NVIDIA Morpheus, NIMs, and RAG pipelines can be integrated to create LLM-based agent pipelines.

### Changed
- All examples now use llama3 models from [Nvidia API Catalog](https://build.nvidia.com/search?term=llama3) as default. Summary of updated examples and the model it uses is available [here](https://nvidia.github.io/GenerativeAIExamples/latest/index.html#developer-rag-examples).
- Switched default embedding model of all examples to [Snowflake arctic-embed-I model](https://build.nvidia.com/snowflake/arctic-embed-l)
- Added more verbose logs and support to configure [log level for chain server using LOG_LEVEL enviroment variable](https://nvidia.github.io/GenerativeAIExamples/latest/configuration.html#chain-server).
- Bumped up version of `langchain-nvidia-ai-endpoints`, `sentence-transformers` package and `milvus` containers
- Updated base containers to use ubuntu 22.04 image `nvcr.io/nvidia/base/ubuntu:22.04_20240212`
- Added `llama-index-readers-file` as dependency to avoid runtime package installation within chain server.


### Deprecated
- Deprecated support of on-prem LLM model deployment using [NeMo Inference Framework Container](https://github.com/NVIDIA/GenerativeAIExamples/blob/v0.6.0/deploy/compose/rag-app-text-chatbot.yaml#L2). Developers can use [Nvidia NIM-LLM to deploy TensorRT optimized models on-prem and plug them in with existing examples](https://nvidia.github.io/GenerativeAIExamples/latest/nim-llms.html).
- Deprecated [kubernetes operator support](https://github.com/NVIDIA/GenerativeAIExamples/tree/v0.6.0/deploy/k8s-operator/kube-trailblazer).
- `nvolveqa_40k` embedding model was deprecated from [Nvidia API Catalog](https://build.nvidia.com). Updated all [notebooks](./notebooks/) and [experimental artifacts](./experimental/) to use [Nvidia embed-qa-4 model](https://build.nvidia.com/nvidia/embed-qa-4) instead.
- Removed [notebooks numbered 00-04](https://github.com/NVIDIA/GenerativeAIExamples/tree/v0.6.0/notebooks), which used on-prem LLM model deployment using deprecated [NeMo Inference Framework Container](https://github.com/NVIDIA/GenerativeAIExamples/blob/v0.6.0/deploy/compose/rag-app-text-chatbot.yaml#L2).


## [0.6.0] - 2024-05-07

### Added
Expand Down
32 changes: 20 additions & 12 deletions 32 README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ State-of-the-art Generative AI examples that are easy to deploy, test, and exten

## NVIDIA NGC

Generative AI Examples can use models and GPUs from the [NVIDIA NGC: AI Development Catalog](https://catalog.ngc.nvidia.com).
Generative AI Examples can use models and GPUs from the [NVIDIA API Catalog](https://catalog.ngc.nvidia.com).

Sign up for a [free NGC developer account](https://ngc.nvidia.com/signin) to access:

Expand All @@ -27,34 +27,32 @@ The examples demonstrate how to combine NVIDIA GPU acceleration with popular LLM
The examples are easy to deploy with [Docker Compose](https://docs.docker.com/compose/).

Examples support local and remote inference endpoints.
If you have a GPU, you can inference locally with [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM).
If you have a GPU, you can inference locally with an [NVIDIA NIM for LLMs](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nim/containers/nim_llm).
If you don't have a GPU, you can inference and embed remotely with [NVIDIA API Catalog endpoints](https://build.nvidia.com/explore/discover).

| Model | Embedding | Framework | Description | Multi-GPU | TRT-LLM | NVIDIA Endpoints | Triton | Vector Database |
| ---------------------------------- | ---------------- | ---------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------- | ------- | ---------------- | ------ | ------------------ |
| mixtral_8x7b | ai-embed-qa-4 | LangChain | NVIDIA API Catalog endpoints chat bot [[code](./RetrievalAugmentedGeneration/examples/nvidia_api_catalog/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/api-catalog.html)] | No | No | Yes | Yes | Milvus or pgvector |
| llama-2 | UAE-Large-V1 | LlamaIndex | Canonical QA Chatbot [[code](./RetrievalAugmentedGeneration/examples/developer_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/local-gpu.html)] | [Yes](https://nvidia.github.io/GenerativeAIExamples/latest/multi-gpu.html) | Yes | No | Yes | Milvus or pgvector |
| llama-2 | all-MiniLM-L6-v2 | LlamaIndex | Chat bot, GeForce, Windows [[repo](https://github.com/NVIDIA/trt-llm-rag-windows/tree/release/1.0)] | No | Yes | No | No | FAISS |
| llama-2 | ai-embed-qa-4 | LangChain | Chat bot with query decomposition agent [[code](./RetrievalAugmentedGeneration/examples/query_decomposition_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/query-decomposition.html)] | No | No | Yes | Yes | Milvus or pgvector |
| mixtral_8x7b | ai-embed-qa-4 | LangChain | Minimilastic example: RAG with NVIDIA AI Foundation Models [[code](./examples/5_mins_rag_no_gpu/), [README](./examples/README.md#rag-in-5-minutes-example)] | No | No | Yes | Yes | FAISS |
| mixtral_8x7b<br>Deplot<br>Neva-22b | ai-embed-qa-4 | Custom | Chat bot with multimodal data [[code](./RetrievalAugmentedGeneration/examples/multimodal_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multimodal-data.html)] | No | No | Yes | No | Milvus or pvgector |
| llama-2 | UAE-Large-V1 | LlamaIndex | Chat bot with quantized LLM model [[docs](https://nvidia.github.io/GenerativeAIExamples/latest/quantized-llm-model.html)] | Yes | Yes | No | Yes | Milvus or pgvector |
| llama3-70b | snowflake-arctic-embed-l | LangChain | NVIDIA API Catalog endpoints chat bot [[code](./RetrievalAugmentedGeneration/examples/nvidia_api_catalog/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/api-catalog.html)] | No | No | Yes | Yes | Milvus or pgvector |
| llama3-8b | snowflake-arctic-embed-l | LlamaIndex | Canonical QA Chatbot [[code](./RetrievalAugmentedGeneration/examples/developer_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/api-catalog.html#using-the-llamaindex-data-framework)] | [Yes](https://nvidia.github.io/GenerativeAIExamples/latest/multi-gpu.html) | Yes | No | Yes | Milvus or pgvector |
| llama3-70b | snowflake-arctic-embed-l | LangChain | Chat bot with query decomposition agent [[code](./RetrievalAugmentedGeneration/examples/query_decomposition_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/query-decomposition.html)] | No | No | Yes | Yes | Milvus or pgvector |
| llama3-70b | ai-embed-qa-4 | LangChain | Minimilastic example: RAG with NVIDIA AI Foundation Models [[code](./examples/5_mins_rag_no_gpu/), [README](./examples/README.md#rag-in-5-minutes-example)] | No | No | Yes | Yes | FAISS |
| llama3-8b<br>Deplot<br>Neva-22b | snowflake-arctic-embed-l | Custom | Chat bot with multimodal data [[code](./RetrievalAugmentedGeneration/examples/multimodal_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multimodal-data.html)] | No | No | Yes | No | Milvus or pvgector |
| llama3-70b | none | PandasAI | Chat bot with structured data [[code](./RetrievalAugmentedGeneration/examples/structured_data_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/structured-data.html)] | No | No | Yes | No | none |
| llama-2 | ai-embed-qa-4 | LangChain | Chat bot with multi-turn conversation [[code](./RetrievalAugmentedGeneration/examples/multi_turn_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multi-turn.html)] | No | No | Yes | No | Milvus or pgvector |
| llama3-8b | snowflake-arctic-embed-l | LangChain | Chat bot with multi-turn conversation [[code](./RetrievalAugmentedGeneration/examples/multi_turn_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multi-turn.html)] | No | No | Yes | No | Milvus or pgvector |

### Enterprise RAG Examples

The enterprise RAG examples run as microservices distributed across multiple VMs and GPUs.
These examples show how to orchestrate RAG pipelines with [Kubernetes](https://kubernetes.io/) and deployed with [Helm](https://helm.sh/).

Enterprise RAG examples include a [Kubernetes operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/) for LLM lifecycle management.
It is compatible with the [NVIDIA GPU operator](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/gpu-operator) that automates GPU discovery and lifecycle management in a Kubernetes cluster.
It is compatible with the [NVIDIA GPU Operator](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/gpu-operator) that automates GPU discovery and lifecycle management in a Kubernetes cluster.

Enterprise RAG examples also support local and remote inference with [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) and [NVIDIA API Catalog endpoints](https://build.nvidia.com/explore/discover).

| Model | Embedding | Framework | Description | Multi-GPU | Multi-node | TRT-LLM | NVIDIA Endpoints | Triton | Vector Database |
| ------- | ----------- | ---------- | -------------------------------------------------------------------------- | --------- | ---------- | ------- | ---------------- | ------ | --------------- |
| llama-2 | NV-Embed-QA | LlamaIndex | Chat bot, Kubernetes deployment [[README](./docs/developer-llm-operator/)] | No | No | Yes | No | Yes | Milvus |
| llama-3 | nv-embed-qa-4 | LlamaIndex | Chat bot, Kubernetes deployment [[chart](https://registry.ngc.nvidia.com/orgs/ohlfw0olaadg/teams/ea-participants/helm-charts/rag-app-text-chatbot)] | No | No | Yes | No | Yes | Milvus |


### Generative AI Model Examples
Expand Down Expand Up @@ -89,6 +87,16 @@ These are open source connectors for NVIDIA-hosted and self-hosted API endpoints
|[NVIDIA Triton Inference Server](https://docs.llamaindex.ai/en/stable/examples/llm/nvidia_triton.html) | [LlamaIndex](https://www.llamaindex.ai/) |Yes|Yes|No|Triton inference server provides API access to hosted LLM models over gRPC. |
|[NVIDIA TensorRT-LLM](https://docs.llamaindex.ai/en/stable/examples/llm/nvidia_tensorrt.html) | [LlamaIndex](https://www.llamaindex.ai/) |Yes|Yes|No|TensorRT-LLM provides a Python API to build TensorRT engines with state-of-the-art optimizations for LLM inference on NVIDIA GPUs. |


## Related NVIDIA RAG Projects

- [NVIDIA Tokkio LLM-RAG](https://docs.nvidia.com/ace/latest/workflows/tokkio/text/Tokkio_LLM_RAG_Bot.html): Use Tokkio to add avatar animation for RAG responses.

- [RAG on Windows using TensorRT-LLM and LlamaIndex](https://github.com/NVIDIA/ChatRTX): Create RAG chatbots on Windows using TensorRT-LLM.

- [Hybrid RAG Project on AI Workbench](https://github.com/NVIDIA/workbench-example-hybrid-rag): Run an NVIDIA AI Workbench example project for RAG.


## Support, Feedback, and Contributing

We're posting these examples on GitHub to support the NVIDIA LLM community and facilitate feedback.
Expand Down
18 changes: 15 additions & 3 deletions 18 RetrievalAugmentedGeneration/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
ARG BASE_IMAGE_URL=nvcr.io/nvidia/base/ubuntu
ARG BASE_IMAGE_TAG=20.04_x64_2022-09-23
ARG BASE_IMAGE_TAG=22.04_20240212

FROM ${BASE_IMAGE_URL}:${BASE_IMAGE_TAG}

Expand All @@ -11,7 +11,7 @@ RUN apt update && \
apt install -y curl software-properties-common libgl1 libglib2.0-0 && \
add-apt-repository ppa:deadsnakes/ppa && \
apt update && apt install -y python3.10 python3.10-dev python3.10-distutils && \
apt-get clean
apt-get clean

# Install pip for python3.10
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10
Expand All @@ -24,20 +24,32 @@ RUN apt autoremove -y curl software-properties-common
# Install common dependencies for all examples
RUN --mount=type=bind,source=RetrievalAugmentedGeneration/requirements.txt,target=/opt/requirements.txt \
pip3 install --no-cache-dir -r /opt/requirements.txt

# Install any example specific dependency if available
ARG EXAMPLE_NAME
COPY RetrievalAugmentedGeneration/examples/${EXAMPLE_NAME} /opt/RetrievalAugmentedGeneration/example
RUN if [ -f "/opt/RetrievalAugmentedGeneration/example/requirements.txt" ] ; then \
pip3 install --no-cache-dir -r /opt/RetrievalAugmentedGeneration/example/requirements.txt ; else \
echo "Skipping example dependency installation, since requirements.txt was not found" ; \
fi
RUN python3.10 -m nltk.downloader averaged_perceptron_tagger

RUN if [ "${EXAMPLE_NAME}" = "multimodal_rag" ] ; then \
apt update && \
apt install -y libreoffice && \
apt install -y tesseract-ocr ; \
fi
# Copy required common modules for all examples
COPY RetrievalAugmentedGeneration/__init__.py /opt/RetrievalAugmentedGeneration/
COPY RetrievalAugmentedGeneration/common /opt/RetrievalAugmentedGeneration/common
COPY integrations /opt/integrations
COPY tools /opt/tools

RUN mkdir /tmp-data/; mkdir /tmp-data/nltk_data/
RUN chmod 777 -R /tmp-data
RUN chown 1000:1000 -R /tmp-data
ENV NLTK_DATA=/tmp-data/nltk_data/
ENV HF_HOME=/tmp-data

WORKDIR /opt
ENTRYPOINT ["uvicorn", "RetrievalAugmentedGeneration.common.server:app"]
Loading
Morty Proxy This is a proxified and sanitized view of the page, visit original site.