-
Notifications
You must be signed in to change notification settings - Fork 390
fix: Use PSS
instead of RSS
to estimate children process memory usage on Linux
#1210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
total_size_bytes = psutil.virtual_memory().total | ||
|
||
return MemoryInfo( | ||
total_size=ByteSize(total_size_bytes), | ||
current_size=ByteSize(current_size_bytes), | ||
) | ||
|
||
|
||
def _get_used_memory(memory_full_info: Any) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This internal type hint does not seem to be available. The actual type is dependent on the OS as well.
Based on the docs: Using |
USS
instead of RSS
to estimate children process memory usagePSS
instead of RSS
to estimate children process memory usage on Linux
To come up with the test was really hard. The test is not nice at all but testing the memory usage estimation is really tricky due to to Python being too high-level for some precise memory control. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for looking into this.
Btw. so in Windows are giving up, since the PSS isn't a concept there? Or is there a chance to use some alternative metric? If so, maybe we can open a follow-up issue?
Couple of questions 🙂
|
So far I can only guess. I have to make some experiments with the JS version to have some data. But if it relies on RSS only, then I think it could also overestimate used memory.
Yes, I think it could be good safety measure to bound it like that, regardless of this change.
In Crawlee only probably not, but my guess is, that multiple Playwright processes could actually use some shared memory which would be overestimated by RSS and probably underestimated with USS. So PSS seems to me like the best in our case as it takes into account shared memory in somewhat predictable way. |
Keep in mind that on the platform, memory usage (and pretty much all the scaling metrics) is coming over websockets, we don't measure it ourselves, so it's very much possible we dont do it perfectly, and nobody noticed, since on localhost, we use 1/4 of the available memory by default. Also given the memory scales with CPU, you usually run things with enough memory. |
Description
To estimate process memory usage use
Proportional Set Size (PSS)
to estimate process memory usage of the process and all it's children to avoid overestimation of used memory due to same shared memory being counted multiple times when usingResident Set Size (RSS)
.PSS
is available only on Linux, so this improved estimation will work only there.Add test.
Issues