evaluation

Evaluation Tool

Tool Details

Evaluation is crucial for retrieval augmented generation (RAG) pipelines as it ensures the accuracy and relevance of information retrieved as well as the generated content.

There are 3 components needed for evaluating the performance of a RAG pipeline:

Data for testing.
Automated metrics to measure performance of both the context retrieval and response generation.
Human-like evaluation of the generated response from the end-to-end pipeline.

This tool provides a set of notebooks that show examples of how to address these requirements in an automated fashion.

Synthetic Data Generation

Using an existing knowledge base we can synthetically generate question|answer|context triplets using a LLM. This tool uses the Llama 2 70B model on Nvidia AI Playground for data generation.

Automated Metrics

RAGAS is an automated metrics tool for measuring performance of both the retriever and generator. We utilize the Nvidia AI Playground langchain wrapper to run RAGAS evaluation on our example RAG pipeline.

LLM-as-a-Judge

We can use LLMs to provide human-like feedback and Likert evaluation scores for full end-to-end RAG pipelines. This tool uses Llama 2 70B as a judge LLM.

Name	Name	Last commit message	Last commit date
parent directory ..
imgs	imgs
01_synthetic_data_generation.ipynb	01_synthetic_data_generation.ipynb
02_filling_RAG_outputs_for_Evaluation.ipynb	02_filling_RAG_outputs_for_Evaluation.ipynb
03_eval_ragas.ipynb	03_eval_ragas.ipynb
04_Human_Like_RAG_Evaluation-AIP.ipynb	04_Human_Like_RAG_Evaluation-AIP.ipynb
Dockerfile.eval	Dockerfile.eval
README.md	README.md
qa_generation.json	qa_generation.json
requirements.txt	requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand file tree

README.md

Evaluation Tool

Tool Details

Synthetic Data Generation

Automated Metrics

LLM-as-a-Judge

Search code, repositories, users, issues, pull requests...

FilesExpand file tree

evaluation

Directory actions

More options

Directory actions

More options

Latest commit

History

evaluation

Folders and files

parent directory

README.md

Evaluation Tool

Tool Details

Synthetic Data Generation

Automated Metrics

LLM-as-a-Judge

Expand file tree