Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Request queue stuck in infinite loop due to ID mismatch in Apify-Scrapy integration #392

Copy link
Copy link
Closed
@vdusek

Description

@vdusek
Issue body actions

Description

When crawling apify.com using the Apify-Scrapy integration, some specific pages cause request queue inconsistencies, leading to an infinite run loop where requests cannot be removed from the queue.

This issue can be reproduced using the code from the Scrapy guide or the Scrapy template.

Problematic URLs:

Observed behavior

Warnings about mismatched request IDs and unique keys during Spider processing:

[title_spider] INFO TitleSpider is parsing <200 https://apify.com/professional-services>... [({"spider": "<TitleSpider 'title_spider' at 0x7f564d5a52b0>"})]
...
[crawlee.storage_clients._memory._request_queue_client] WARN The request ID does not match the ID from the unique_key (request.id=xTOtjOm0Pqt40fn, id=U6aWc4CgOSWQl13).
...
[crawlee.storage_clients._memory._request_queue_client] WARN The request ID does not match the ID from the unique_key (request.id=QGWv6iVEdK2bHHD, id=DmSWl7p6JKFfUYx).
...
[crawlee.storage_clients._memory._request_queue_client] WARN The request ID does not match the ID from the unique_key (request.id=RWm45nonkuuAh9r, id=5lI8IiodqKAguLk).
...
[title_spider] INFO TitleSpider is parsing <200 https://apify.com/templates>... [({"spider": "<TitleSpider 'title_spider' at 0x7f564d5a52b0>"})]
...

Requests remain stuck in the request queue, causing repeated processing attempts. Resulting in endless loop of the following logs:

...
[crawlee._utils.system] DEBUG Calling get_memory_info()...
[crawlee._utils.system] DEBUG Calling get_cpu_info()...
[crawlee._utils.system] DEBUG Calling get_memory_info()...
[crawlee._utils.system] DEBUG Calling get_cpu_info()...
[crawlee.storages._request_queue] DEBUG Skipping request from queue head, already in progress or recently handled ({"id": "xTOtjOm0Pqt40fn", "unique_key": "GET|e3b0c442|e3b0c442|https://blog.apify.com/intercom-customer-support-ai-chatbot-web-scraping", "in_progress": false, "recently_handled": true})
[crawlee.storages._request_queue] DEBUG Skipping request from queue head, already in progress or recently handled ({"id": "QGWv6iVEdK2bHHD", "unique_key": "GET|e3b0c442|e3b0c442|https://blog.apify.com/how-web-scraping-ai-and-the-eu-have-come-together-to-sweep-away-fake-discounts-in-europe", "in_progress": false, "recently_handled": true})
[crawlee.storages._request_queue] DEBUG Skipping request from queue head, already in progress or recently handled ({"id": "RWm45nonkuuAh9r", "unique_key": "GET|e3b0c442|e3b0c442|https://blog.apify.com/groupon-reaches-new-merchants-with-web-data-collection", "in_progress": false, "recently_handled": true})
[crawlee.storages._request_queue] DEBUG Queue head still returned requests that need to be processed (or that are locked by other clients)
[crawlee._utils.system] DEBUG Calling get_memory_info()...
[crawlee._utils.system] DEBUG Calling get_cpu_info()...
[crawlee._utils.system] DEBUG Calling get_memory_info()...
[crawlee._utils.system] DEBUG Calling get_cpu_info()...
...
honzajavorek

Metadata

Metadata

Assignees

Labels

bugSomething isn't working.Something isn't working.t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions

    Morty Proxy This is a proxified and sanitized view of the page, visit original site.