Benchmark data set for RAG on PDF files #381

Sep 5, 2024

jhmuller
Sep 5, 2024

Does anyone know of any benchmark data sets that I could use to
evaluate LlamaParse versus other existing simpler solutions.

In the example code is one example of comparing to not using LlamaParse on the PDF,
but I want to do more than just some one off comparisons.

Thanks greatly in advance

john

Feb 27, 2025

AlbertDoesProgramming
Feb 27, 2025

This is a very tricky topic and something I've been grappling with too. Consider using an independent measure of your overall rag pipeline (like the BIERS llama index implementation) and asses if there are any performance differences between two different parsing strategies. For direct benchmarks, have a flick through archivx, saw this after a couple of searches - might be worth your while! https://arxiv.org/pdf/2412.07626

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark data set for RAG on PDF files #381

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Search code, repositories, users, issues, pull requests...

Benchmark data set for RAG on PDF files #381

Uh oh!

jhmuller Sep 5, 2024

Replies: 1 comment

Uh oh!

AlbertDoesProgramming Feb 27, 2025

jhmuller
Sep 5, 2024

AlbertDoesProgramming
Feb 27, 2025