Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

fix: Fix calculation of CPU utilization from SystemInfo events #447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 27, 2025

Conversation

janbuchar
Copy link
Contributor

Apify reports CPU utilization as the sum of utilization of all CPUs, but Crawlee expects a number between 0 and 1. Because of this, it is impossible for AutoscaledPool to scale beyond one CPU.

@janbuchar janbuchar added adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. labels Mar 26, 2025
@janbuchar janbuchar requested review from vdusek and Pijukatel March 26, 2025 17:07
@github-actions github-actions bot added this to the 111th sprint - Tooling team milestone Mar 26, 2025
@vdusek vdusek force-pushed the fix-cpu-utilization-calculation branch from f388d1f to b63a243 Compare March 27, 2025 07:41
Copy link
Contributor

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@Pijukatel Pijukatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just something to think about: Is this scaling based on multiple cpus load working well in python context?

I think that there we are mostly running on one core all the time, not sure if there are some load relevant sub-processes. So scaling based on other cores being not utilized does not seem very compatible with current somewhat single process crawlee architecture.

@janbuchar
Copy link
Contributor Author

Just something to think about: Is this scaling based on multiple cpus load working well in python context?

Well, about the same as in Javascript context really 🤷

I think that there we are mostly running on one core all the time, not sure if there are some load relevant sub-processes. So scaling based on other cores being not utilized does not seem very compatible with current somewhat single process crawlee architecture.

Playwright is the most relevant one, I suppose. For HTTP-based crawlers, I agree that we probably won't be able to utilize more than one core. But also, most people probably won't try to run those on thicker units than your standard 4GB one.

@janbuchar janbuchar merged commit eb4c8e4 into master Mar 27, 2025
27 checks passed
@janbuchar janbuchar deleted the fix-cpu-utilization-calculation branch March 27, 2025 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.