Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[Messenger] Add call to gc_collect_cycles() after each message is handled #52253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 25, 2023
Merged

[Messenger] Add call to gc_collect_cycles() after each message is handled #52253

merged 1 commit into from
Oct 25, 2023

Conversation

jwage
Copy link
Contributor

@jwage jwage commented Oct 23, 2023

Q A
Branch? 6.4
Bug fix? no
New feature? no
Deprecations? no
License MIT

I have been working on improving memory usage in my messenger worker queues and I had been chasing what I thought was a memory leak, but I haven't been able to find it. In my troubleshooting, I pushed a change to production which simply logs the result of gc_status() after each message is handled, and I noticed that it was taking a very long time for the automatic garbage collection to run. Additionally, each time it would finally run, the PHP algorithm internally would extend the threshold, making it take even longer to run the next time. So the number of objects in memory waiting to be collected would keep growing and the memory per process would grow. In my application, it is important for the memory for each worker process to stay relatively flat so that I can know how many worker processes I can fit on each worker server.

I added a call to gc_collect_cycles() in my application after each message is handled, and it seems to have improved things for me. I suppose it is still possible I have a memory leak, but I haven't been able to find it.

Here is a screenshot of memory usage from one of my worker servers after I deployed that change.

Screenshot 2023-10-23 at 12 32 46 PM

In my application, it takes < 1ms to call gc_collect_cycles() after each message is handled and that is satisfactory for me in order to keep memory usage lower.

If this is accepted, should we make it optionally enabled?

@carsonbot carsonbot added this to the 6.4 milestone Oct 23, 2023
@nicolas-grekas nicolas-grekas added the ❄️ Feature Freeze Important Pull Requests to finish before the next Symfony "feature freeze" label Oct 23, 2023
@goetas
Copy link
Contributor

goetas commented Oct 24, 2023

We have almost the same middleware on all our projects, it brings the same memory improvement. My guess is that the number of roots the php garbage collector waits is too high for most of the situations

@lyrixx
Copy link
Member

lyrixx commented Oct 24, 2023

Can you share the impact on CPU?
Usually RAM is much cheaper than CPU

@OskarStark OskarStark changed the title [Messenger] Add call to gc_collect_cycles() after each message is handled. [Messenger] Add call to gc_collect_cycles() after each message is handled Oct 24, 2023
@jwage
Copy link
Contributor Author

jwage commented Oct 24, 2023

Can you share the impact on CPU?

Usually RAM is much cheaper than CPU

We haven't seen any noticeable difference in the CPU load graphs but our worker servers aren't under heavy load. It takes around 1ms each time we call gc_collect_cycles().

image

@jwage
Copy link
Contributor Author

jwage commented Oct 24, 2023

We have almost the same middleware on all our projects, it brings the same memory improvement. My guess is that the number of roots the php garbage collector waits is too high for most of the situations

Yes, that was my conclusion as well. And it is my understanding that PHP will adjust that threshold higher or lower each time gc runs depending on how much was cleaned up. If not much was cleaned up, then it will wait longer. For some reason in my case, the threshold gets extended higher and higher and it takes longer and longer for gc to run on its own.

@lyrixx
Copy link
Member

lyrixx commented Oct 24, 2023

And it is my understanding that PHP will adjust that threshold higher or lower each time gc runs depending on how much was cleaned up

PHP run GC when the number of object is > 10K (it's configurable, but only during compile time)

@jwage
Copy link
Contributor Author

jwage commented Oct 24, 2023

And it is my understanding that PHP will adjust that threshold higher or lower each time gc runs depending on how much was cleaned up

PHP run GC when the number of object is > 10K (it's configurable, but only during compile time)

👍 and the threshold was getting extended to 20k, then 30k, etc. each time the automatic gc would run.

@stof
Copy link
Member

stof commented Oct 24, 2023

We discussed that on Slack and I investigated the way the GC works in PHP.

Manual runs of the GC (triggered by the gc_collect_cycles() function don't adjust the threshold at all.
Automatic runs are adjusting the threshold based on the number of collected cycles: if less than 100, the threshold gets increased by 10000 (it starts at 10001) otherwise it gets decreased by 10000 if higher than the initial one.

But this means that if an automatic GC run cannot collect objects (because you have lots of alive objects at that time), it will increase the threshold and that threshold won't ever decrease again if those objects are cleaned by the manual run.

@PhilETaylor
Copy link
Contributor

PhilETaylor commented Oct 24, 2023

I added a call to gc_collect_cycles() in my application after each message is handled,

I have applied this PR (along with an echo statement to tell me when that line is run) to my dev stack, and started my 5 workers (We have over 1000 workers in production),

Without running any messages, just sitting there with a worker started, the gc_collect_cycles is running over and over and over again - continuously every few seconds (evidenced by logging out to the terminal)

This is not "after each message is handled" like you stated, its near continuous... and obviously if there are more than worker on the same server, that "continuous" is x the n number of workers also.

If you needed it "after each message is handled" surely then it should be listened with WorkerMessageHandledEvent?

Or am I missing something?

@jwage
Copy link
Contributor Author

jwage commented Oct 24, 2023

@PhilETaylor initially, it was inside the foreach ($envelopes as $envelope) { loop, but in the conversation here #52253 (comment), we discussed moving it outside of the foreach and that was done in this commit 06b0713. We are missing some additional logic around the gc_collect_cycles() call now to only run it when the receivers return envelopes. I will work on another commit to address this.

Edit: Try this commit 0084c45

@PhilETaylor
Copy link
Contributor

@jwage Thanks - the 0084c45 fixes the "continuous" calling. I'm deploying this to "production" now, if you dont hear from me, it didn't break haha.

@jwage
Copy link
Contributor Author

jwage commented Oct 24, 2023

@jwage Thanks - the 0084c45 fixes the "continuous" calling. I'm deploying this to "production" now, if you dont hear from me, it didn't break haha.

@PhilETaylor 😮 you live on the bleeding edge :) :)

@PhilETaylor
Copy link
Contributor

PhilETaylor commented Oct 24, 2023

@jwage nothing like running a SaaS with 1000s of paying users on decent code :) Symfony makes me powerful
Current status:
ScreenShot-2023-10-24-23 30 23

edit: Nothing broke :-) but no huge decrease in memory for me either... dip is the deployment

ScreenShot-2023-10-24-23 32 56

@jwage
Copy link
Contributor Author

jwage commented Oct 24, 2023

@PhilETaylor it is my understanding that if your code doesn't have any cyclic references, then the gc_collect_cycles() won't have to do anything because the objects will be destroyed immediately when the refcount === 0. So if your code doesn't have any cyclic references, then you won't see any benefit. Did you see any difference in your cpu/load graphs?

@PhilETaylor
Copy link
Contributor

The CPU and Load graphs have their normal spike during deployment but other than that the loads before and after are pretty consistent with each other with no major decrease and no major increase

@jwage
Copy link
Contributor Author

jwage commented Oct 24, 2023

The CPU and Load graphs have their normal spike during deployment but other than that the loads before and after are pretty consistent with each other with no major decrease and no major increase

Thanks. I am not expecting a drop, but there was some concern that it could increase cpu usage since we're calling gc more often than we were before. Thanks for testing!

@nicolas-grekas
Copy link
Member

Thank you @jwage.

@nicolas-grekas nicolas-grekas merged commit 8720b78 into symfony:6.4 Oct 25, 2023
@jwage jwage deleted the messenger-gc-collect-cycles branch October 25, 2023 17:16
fabpot added a commit that referenced this pull request Apr 21, 2025
…er each message is handled" (jwage)

This PR was merged into the 6.4 branch.

Discussion
----------

[Messenger] Revert " Add call to `gc_collect_cycles()` after each message is handled"

This reverts commit b0df65a.

Reverting changes introduced in #52253 because of the issue reported here #59788

Commits
-------

fc9c551 Revert "[Messenger] Add call to `gc_collect_cycles()` after each message is handled"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Messenger ❄️ Feature Freeze Important Pull Requests to finish before the next Symfony "feature freeze" Status: Reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.