filtering crawling links using llms #1084

Mar 13, 2025

kashyab12
Mar 13, 2025

Hey! Love crawlee so far. The main issue I am facing is I want to filtering the urls to crawl for a given page using LLMs. Is there a clean way to do this? So far I implemented a transformer for enqueue_links which saves the links to a dict and then process those dicts at a later point of time using another crawler object. Any other suggestions to solve this problem? I don't want to make the llm call in the transform function since that would be an LLM call per URL found which is quite expensive.

vdusek · Mar 15, 2025

janbuchar
Mar 15, 2025
Maintainer

Hello, so do I understand it right that you want to store the links you find on a page and only filter and enqueue them after you gather a bigger batch?

1 reply

vdusek Mar 17, 2025
Maintainer

Hi, once we merge PR #1024, it should help with your use case. You'll be able to call just extract_links (without enqueueing them), filter the results as needed, store the links somewhere, and then enqueue them later using add_requests(extracted_links). I believe that's what you need.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

filtering crawling links using llms #1084

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment · 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Search code, repositories, users, issues, pull requests...

filtering crawling links using llms #1084

Uh oh!

kashyab12 Mar 13, 2025

Replies: 1 comment · 1 reply

Uh oh!

janbuchar Mar 15, 2025 Maintainer

Uh oh!

vdusek Mar 17, 2025 Maintainer

kashyab12
Mar 13, 2025

janbuchar
Mar 15, 2025
Maintainer

vdusek Mar 17, 2025
Maintainer