[ISSUE #8875] Fix HAConnection leak#8876
Merged
RongtongJin merged 1 commit intoapache:developapache/rocketmq:developfrom Oct 30, 2024
Merged
[ISSUE #8875] Fix HAConnection leak#8876RongtongJin merged 1 commit intoapache:developapache/rocketmq:developfrom
RongtongJin merged 1 commit intoapache:developapache/rocketmq:developfrom
Conversation
在一些异常场景,可以模拟对ha端口的探活测试,由于conn.start会马上启动线程,在足够快失败的场景,会先执行removeConnection动作,而后再执行addConnection动作,最终造成connection泄漏,滞留 HaService 的connectionList里面,累积造成内存溢出风险。
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #8876 +/- ##
=============================================
- Coverage 47.63% 47.56% -0.07%
+ Complexity 11756 11734 -22
=============================================
Files 1304 1304
Lines 91043 91043
Branches 11675 11675
=============================================
- Hits 43364 43301 -63
- Misses 42346 42401 +55
- Partials 5333 5341 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
RongtongJin
approved these changes
Oct 30, 2024
|
确实存在这个问题我们遇到了,观察了好久,以为是消息积压,但是发现在消息未积压的情况下也发生了OOM.而且很有规律大概20天作用OOM一次。由于我们生产环境加了RocketMQ 的健康检查通过 通过端口探测是否存活,导致HAconnect 占了大量内存。后续我把dump 分析截图放上来 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which Issue(s) This PR Fixes
Fixes #issue_id
Brief Description
在一些异常场景,可以模拟对ha端口的探活测试,由于conn.start会马上启动线程,在足够快失败的场景,会先执行removeConnection动作,而后再执行addConnection动作,最终造成connection泄漏,滞留 HaService 的connectionList里面,累积造成内存溢出风险。
How Did You Test This Change?
使用python脚本进行ha端口10912的高频探活验证是否还存在内存泄漏
-- coding:UTF-8 --
import sys
import socket
import time
sys.path.append(".")
def tcp_health_check(host, port):
now = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
try:
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.settimeout(1)
sock.connect((host, port))
sock.sendall(now.encode())
print("{}TCP连接成功:{}:{}".format(now,host,port))
sock.close()
return True
except socket.error as e:
print("TCP连接失败: {}".format(e))
if sock:
sock.close()
return False
def main():
host = '127.0.0.1'
port = 8081
if name == "main":
main()