Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

feat: Implement Scrapy HTTP cache backend #403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Mar 18, 2025

Conversation

honzajavorek
Copy link
Contributor

@honzajavorek honzajavorek commented Feb 14, 2025

I successfully use this code in my project. In 381c044 I explicitly specify licensing. I didn't add docs or anything else (yet), just made sure the code passes linters and a type check.

Relates: apify/actor-templates#303

@honzajavorek honzajavorek changed the title Implement Scrapy HTTP cache backend feat: Implement Scrapy HTTP cache backend Feb 14, 2025
@honzajavorek honzajavorek had a problem deploying to fork-pr-integration-tests February 14, 2025 17:01 — with GitHub Actions Failure
Copy link
Contributor

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this work by default, also adjust the apify.scrapy.utils.apply_apify_settings. If this allows us to handle platform migration, we should turn caching by default. Also please add a reasonable comments there.

src/apify/scrapy/cache.py Outdated Show resolved Hide resolved
src/apify/scrapy/cache.py Outdated Show resolved Hide resolved
src/apify/scrapy/cache.py Outdated Show resolved Hide resolved
src/apify/scrapy/cache.py Outdated Show resolved Hide resolved
src/apify/scrapy/cache.py Outdated Show resolved Hide resolved
src/apify/scrapy/cache.py Outdated Show resolved Hide resolved
src/apify/scrapy/cache.py Outdated Show resolved Hide resolved
@vdusek vdusek added t-tooling Issues with this label are in the ownership of the tooling team. enhancement New feature or request. labels Feb 18, 2025
@vdusek vdusek added this to the 108th sprint - Tooling team milestone Feb 18, 2025
@honzajavorek
Copy link
Contributor Author

Thanks for the review! I considered this to be a kick off which will definitely get a review with changes requested. If #404 lets me, I'll work on addressing the comments in the upcoming days.

@honzajavorek
Copy link
Contributor Author

I bet I've seen comments about adding this to Apify settings by default, but I can't see them now. I think it's a good idea to set the cache storage by default, but I wouldn't turn on caching by default, because it's application-specific how caching should work exactly (e.g. expiration time). We should just document somewhere that this is a preferred solution to handling the forced restarts, if they happen.

@vdusek
Copy link
Contributor

vdusek commented Feb 19, 2025

I bet I've seen comments about adding this to Apify settings by default, but I can't see them now. I think it's a good idea to set the cache storage by default, but I wouldn't turn on caching by default, because it's application-specific how caching should work exactly (e.g. expiration time). We should just document somewhere that this is a preferred solution to handling the forced restarts, if they happen.

Agreed, thanks 🙂.

@honzajavorek
Copy link
Contributor Author

Just for the record, the comment I've seen didn't get lost, it's the final comment of the review 😄

@honzajavorek
Copy link
Contributor Author

I think I'm done for now!

@honzajavorek honzajavorek requested a review from vdusek March 10, 2025 14:20
docs/02_guides/05_scrapy.mdx Show resolved Hide resolved
This code has been originally developed for the
https://github.com/juniorguru/plucker/ project, which is licensed under
AGPL-3.0-only. I am the sole author of that code and hereby I grant
the https://github.com/apify/apify-sdk-python/ project the right to use
it under an Apache-2.0 license, without the copyleft taking effect.
My intention is to contribute this code to the upstream, remove
it from my project, and only import the component from the apify
package as a dependency, as I believe this component could be useful
to other users of the apify package.
This error was introduced when I tried to make all
the linters happy, uh.
@honzajavorek honzajavorek requested a review from vdusek March 14, 2025 11:05
vdusek

This comment was marked as resolved.

src/apify/scrapy/extensions/_httpcache.py Outdated Show resolved Hide resolved
@honzajavorek honzajavorek had a problem deploying to fork-pr-integration-tests March 14, 2025 11:29 — with GitHub Actions Failure
@vdusek vdusek had a problem deploying to fork-pr-integration-tests March 14, 2025 11:52 — with GitHub Actions Error
src/apify/scrapy/extensions/_httpcache.py Show resolved Hide resolved
@honzajavorek honzajavorek had a problem deploying to fork-pr-integration-tests March 18, 2025 13:13 — with GitHub Actions Error
Copy link
Contributor

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@vdusek vdusek merged commit 137e3c8 into apify:master Mar 18, 2025
25 of 27 checks passed
@honzajavorek
Copy link
Contributor Author

cat-highfive

@honzajavorek honzajavorek deleted the honzajavorek/cache branch March 18, 2025 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.