fix: Fix calculation of CPU utilization from SystemInfo events #447

janbuchar · Mar 26, 2025

Apify reports CPU utilization as the sum of utilization of all CPUs, but Crawlee expects a number between 0 and 1. Because of this, it is impossible for AutoscaledPool to scale beyond one CPU.

vdusek

LGTM

Pijukatel

Just something to think about: Is this scaling based on multiple cpus load working well in python context?

I think that there we are mostly running on one core all the time, not sure if there are some load relevant sub-processes. So scaling based on other cores being not utilized does not seem very compatible with current somewhat single process crawlee architecture.

janbuchar · Mar 27, 2025

Just something to think about: Is this scaling based on multiple cpus load working well in python context?

Well, about the same as in Javascript context really 🤷

I think that there we are mostly running on one core all the time, not sure if there are some load relevant sub-processes. So scaling based on other cores being not utilized does not seem very compatible with current somewhat single process crawlee architecture.

Playwright is the most relevant one, I suppose. For HTTP-based crawlers, I agree that we probably won't be able to utilize more than one core. But also, most people probably won't try to run those on thicker units than your standard 4GB one.

janbuchar added adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. labels Mar 26, 2025

janbuchar requested review from vdusek and Pijukatel March 26, 2025 17:07

github-actions bot assigned janbuchar Mar 26, 2025

github-actions bot added this to the 111th sprint - Tooling team milestone Mar 26, 2025

fix: Fix calculation of CPU utilization from SystemInfo events

b63a243

vdusek force-pushed the fix-cpu-utilization-calculation branch from f388d1f to b63a243 Compare March 27, 2025 07:41

vdusek approved these changes Mar 27, 2025

View reviewed changes

Pijukatel reviewed Mar 27, 2025

View reviewed changes

janbuchar merged commit eb4c8e4 into master Mar 27, 2025
27 checks passed

janbuchar deleted the fix-cpu-utilization-calculation branch March 27, 2025 09:52

Pijukatel mentioned this pull request Mar 27, 2025

Autoscaling based on multiple cpu utilization for single process crawlers? apify/crawlee-python#1119

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Fix calculation of CPU utilization from SystemInfo events #447

fix: Fix calculation of CPU utilization from SystemInfo events #447

Uh oh!

janbuchar commented Mar 26, 2025

Uh oh!

vdusek left a comment

Uh oh!

Pijukatel left a comment •

edited

Loading

Uh oh!

janbuchar commented Mar 27, 2025

Uh oh!

Uh oh!

Uh oh!

Search code, repositories, users, issues, pull requests...

fix: Fix calculation of CPU utilization from SystemInfo events #447

fix: Fix calculation of CPU utilization from SystemInfo events #447

Uh oh!

Conversation

janbuchar commented Mar 26, 2025

Uh oh!

vdusek left a comment

Choose a reason for hiding this comment

Uh oh!

Pijukatel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

janbuchar commented Mar 27, 2025

Uh oh!

Uh oh!

Uh oh!

Pijukatel left a comment •

edited

Loading