Improving GC collections: dynamic thresholds, single generation gc and time barriers

In the pursuit of trying to optimize GC runs it has been observed that the weak generational hypothesis may not apply that well to Python. This is because, according to this argument, in the presence of a mixel cycle GC + refcount GC strategy, young objects are mostly cleaned up by reference count, not by the GC. Is important as well that there is no segregation between the GC strategies and that the cycle GC needs to deal with objects that according to this will be mainly cleaned up by reference count alone.

This questions the utility of segregating GC by generations and indeed there is some evidence of this. I have been benchmarking the percentage of success of different generations in some programs (such as blach and mypy and a bunch of HTTP servers) and the success rate of the lower generations is generally small. Here is an example of running black over all the standard library:

Statistics for generation 0

Category	Value
count	157917.000000
mean	1.192775
std	3.560391
min	0.000000
25%	0.000000
50%	0.000000
75%	0.480192
max	86.407768

Statistics for generation 1

Category	Value
count	14346.000000
mean	2.670852
std	11.642815
min	0.000000
25%	0.000000
50%	0.000000
75%	0.388794
max	97.406097

Statistics for generation 2

Category	Value
count	1280.000000
mean	45.698135
std	27.066735
min	0.000000
25%	31.965862
50%	54.618008
75%	67.038842
max	90.592705

I am currently investigating if having a single generation with a dynamic threshold that is similar to the strategy that we use currently for the last generation would be generally better to get better performance.

What do you think?

Linked PRs

gh-100403: Collect GC statistics via -X option #100958

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improving GC collections: dynamic thresholds, single generation gc and time barriers #100403

Statistics for generation 0

Statistics for generation 1

Statistics for generation 2

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Search code, repositories, users, issues, pull requests...

Uh oh!

Improving GC collections: dynamic thresholds, single generation gc and time barriers #100403

Description

Statistics for generation 0

Statistics for generation 1

Statistics for generation 2

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions