-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[Messenger] Add call to gc_collect_cycles()
after each message is handled
#52253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We have almost the same middleware on all our projects, it brings the same memory improvement. My guess is that the number of roots the php garbage collector waits is too high for most of the situations |
Can you share the impact on CPU? |
gc_collect_cycles()
after each message is handled
Yes, that was my conclusion as well. And it is my understanding that PHP will adjust that threshold higher or lower each time gc runs depending on how much was cleaned up. If not much was cleaned up, then it will wait longer. For some reason in my case, the threshold gets extended higher and higher and it takes longer and longer for gc to run on its own. |
PHP run GC when the number of object is > 10K (it's configurable, but only during compile time) |
👍 and the threshold was getting extended to 20k, then 30k, etc. each time the automatic gc would run. |
We discussed that on Slack and I investigated the way the GC works in PHP. Manual runs of the GC (triggered by the But this means that if an automatic GC run cannot collect objects (because you have lots of alive objects at that time), it will increase the threshold and that threshold won't ever decrease again if those objects are cleaned by the manual run. |
I have applied this PR (along with an echo statement to tell me when that line is run) to my dev stack, and started my 5 workers (We have over 1000 workers in production), Without running any messages, just sitting there with a worker started, the gc_collect_cycles is running over and over and over again - continuously every few seconds (evidenced by logging out to the terminal) This is not "after each message is handled" like you stated, its near continuous... and obviously if there are more than worker on the same server, that "continuous" is x the n number of workers also. If you needed it "after each message is handled" surely then it should be listened with Or am I missing something? |
@PhilETaylor initially, it was inside the Edit: Try this commit 0084c45 |
@PhilETaylor 😮 you live on the bleeding edge :) :) |
@jwage nothing like running a SaaS with 1000s of paying users on decent code :) Symfony makes me powerful edit: Nothing broke :-) but no huge decrease in memory for me either... dip is the deployment |
@PhilETaylor it is my understanding that if your code doesn't have any cyclic references, then the |
The CPU and Load graphs have their normal spike during deployment but other than that the loads before and after are pretty consistent with each other with no major decrease and no major increase |
Thanks. I am not expecting a drop, but there was some concern that it could increase cpu usage since we're calling gc more often than we were before. Thanks for testing! |
Thank you @jwage. |
…er each message is handled" (jwage) This PR was merged into the 6.4 branch. Discussion ---------- [Messenger] Revert " Add call to `gc_collect_cycles()` after each message is handled" This reverts commit b0df65a. Reverting changes introduced in #52253 because of the issue reported here #59788 Commits ------- fc9c551 Revert "[Messenger] Add call to `gc_collect_cycles()` after each message is handled"
I have been working on improving memory usage in my messenger worker queues and I had been chasing what I thought was a memory leak, but I haven't been able to find it. In my troubleshooting, I pushed a change to production which simply logs the result of
gc_status()
after each message is handled, and I noticed that it was taking a very long time for the automatic garbage collection to run. Additionally, each time it would finally run, the PHP algorithm internally would extend the threshold, making it take even longer to run the next time. So the number of objects in memory waiting to be collected would keep growing and the memory per process would grow. In my application, it is important for the memory for each worker process to stay relatively flat so that I can know how many worker processes I can fit on each worker server.I added a call to
gc_collect_cycles()
in my application after each message is handled, and it seems to have improved things for me. I suppose it is still possible I have a memory leak, but I haven't been able to find it.Here is a screenshot of memory usage from one of my worker servers after I deployed that change.
In my application, it takes < 1ms to call
gc_collect_cycles()
after each message is handled and that is satisfactory for me in order to keep memory usage lower.If this is accepted, should we make it optionally enabled?