Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

filtering crawling links using llms #1084

Unanswered
kashyab12 asked this question in Q&A
Mar 13, 2025 · 1 comments · 1 reply
Discussion options

Hey! Love crawlee so far. The main issue I am facing is I want to filtering the urls to crawl for a given page using LLMs. Is there a clean way to do this? So far I implemented a transformer for enqueue_links which saves the links to a dict and then process those dicts at a later point of time using another crawler object. Any other suggestions to solve this problem? I don't want to make the llm call in the transform function since that would be an LLM call per URL found which is quite expensive.

You must be logged in to vote

Replies: 1 comment · 1 reply

Comment options

Hello, so do I understand it right that you want to store the links you find on a page and only filter and enqueue them after you gather a bigger batch?

You must be logged in to vote
1 reply
@vdusek
Comment options

Hi, once we merge PR #1024, it should help with your use case. You'll be able to call just extract_links (without enqueueing them), filter the results as needed, store the links somewhere, and then enqueue them later using add_requests(extracted_links). I believe that's what you need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
🙏
Q&A
Labels
None yet
3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.