-
Notifications
You must be signed in to change notification settings - Fork 44
[Bug] Fix ping monitor timeout issues and add enhanced timeout controls #212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Bug] Fix ping monitor timeout issues and add enhanced timeout controls #212
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #212 +/- ##
==========================================
+ Coverage 19.90% 20.28% +0.37%
==========================================
Files 181 181
Lines 18829 18900 +71
==========================================
+ Hits 3748 3833 +85
+ Misses 14897 14878 -19
- Partials 184 189 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
0xfurai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, @tafaust — huge thanks for stepping up and fixing this one! 🙌
This issue had been sitting for a while and no one wanted to tackle it, but you did — and that really means a lot.
Appreciate your initiative and contribution to the project 💪
| } | ||
|
|
||
| // Validate per-request timeout range | ||
| if pingCfg.PerRequestTimeout < PING_PER_REQUEST_TIMEOUT_MIN || pingCfg.PerRequestTimeout > PING_PER_REQUEST_TIMEOUT_MAX { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels little weird to have bunch of different timeouts to setup from client. Can we handle everything with just global timeout? just add some time on top for evaluation, to simplify UI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I show calculated global timeout as read-only in the frontend?
For the two different pings:
- Per-request timeout: Controls how long to wait for each packet (network responsiveness)
- Global timeout: Prevents infinite waits (safety net)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a3615c8 (not tested yet) which should comply with a more simplified UI and some reasonable safe guard in the backend.
…cancellation - Replace net.ResolveIPAddr with context-aware net.DefaultResolver.LookupIPAddr - Add IPv4 address filtering for DNS resolution results - Implement context cancellation checks before sending ICMP packets - Refactor ICMP reply reading with goroutine and select for timeout handling - Improve error handling for address resolution and message parsing - Add proper timeout support using context cancellation
- Introduce new configuration fields for ping count and per-request timeout in PingConfig. - Implement validation for count and timeout ranges to ensure proper configuration. - Update the native ping implementation to support multiple pings with customizable timeout settings. - Enhance logging to provide detailed feedback on ping attempts and results.
- Introduce a new test file for PingExecutor, covering various scenarios including timeout validation, context cancellation, and default value handling. - Implement tests for multi-ping configurations and ensure proper responses for valid and invalid configurations. - Address specific edge cases, including context cancellation and device offline scenarios, to enhance reliability and robustness of the ping monitoring feature.
…ation - Introduce new fields for count and per-request timeout in the PingConfig interface. - Implement validation for these fields in the ping schema to ensure proper configuration. - Update the default values and serialization/deserialization logic to accommodate the new fields. - Enhance the UI to include input fields for count and per-request timeout with appropriate labels and descriptions in the localization files.
…files for ping settings - Introduced new localization entries for per-request timeout and count in multiple languages. - Updated existing ping configuration translations to include descriptions for the new fields. - Ensured consistency across all localization files to enhance user experience in configuring ping settings.
- Added default values for PacketSize, Count, and PerRequestTimeout fields in the PingConfig struct using struct tags. - Introduced applyDefaults function to set default values for zero-value fields during configuration. - Removed manual default value checks from the Validate and Execute methods, streamlining the code and enhancing maintainability.
- Implemented CalculateGlobalTimeout method in PingExecutor to compute the global timeout based on ping configuration, capping it at 60 seconds. - Updated Execute method to utilize the calculated global timeout for ping operations. - Enhanced the UI to display the calculated global timeout in the PingForm, including a description for better user understanding.
b6daec3 to
a3615c8
Compare
- Deleted the CalculateGlobalTimeout method to streamline the PingExecutor functionality. - This change simplifies the codebase by removing unused timeout calculation logic, as it is now handled elsewhere.
- Renamed test case to reflect the automatic calculation of global timeout. - Adjusted assertions to validate the expected behavior for a valid host, ensuring the global timeout is calculated as 3 * per_request_timeout + 2, capped at 60 seconds. - Removed outdated comments and improved clarity in test descriptions.
|
Not sure how better do timeout thing but something is off here. We do have parental control on the context cancellation at the moment in
maybe it was not the best solution and we need to pass timeout control to executor. anyway we need to make sure we don't get over the interval limit so code evaluation of this solution will take some time. |
|
@0xfurai what does that mean? |
Problem
Ping monitors don't report devices as down in a timely manner when they go offline. The monitor continues showing "up" status for minutes instead of immediately detecting the failure within the configured timeout period (default 20 seconds with 1 retry).
Issue Source: GitHub Issue #157 - User reported Uptime Kuma handles this correctly while Peekaping does not.
Root Causes
1. Context Ignored in Native ICMP Implementation
ctxparameter was passed but never usednet.ResolveIPAddr(non-context-aware)2. Missing Advanced Timeout Controls
Uptime Kuma uses a proven dual timeout system:
Solution
Phase 1: Critical Context Fix ✅
net.ResolveIPAddrwithnet.DefaultResolver.LookupIPAddr(ctx, host)selectstatements to monitorctx.Done()before ICMP operationsPhase 2: Enhanced Timeout System ✅
countfield (1-100 pings, default 1)per_request_timeoutfield (1-60s, default 2s)Changes Made
Core Files Modified
apps/server/src/modules/healthcheck/executor/ping.goPingConfigstruct withCountandPerRequestTimeoutfieldsNew Features
Testing
TestPingExecutor_GitHubIssue157that reproduces the exact scenarioResults
Before Fix
After Fix
Testing
All tests pass including the new GitHub Issue #157 reproduction tests:
=== RUN TestPingExecutor_GitHubIssue157 === RUN TestPingExecutor_GitHubIssue157/DeviceOfflineScenario === RUN TestPingExecutor_GitHubIssue157/DeviceOfflineWithMultiplePings === RUN TestPingExecutor_GitHubIssue157/DeviceOfflineFastTimeout === RUN TestPingExecutor_GitHubIssue157/DeviceOfflineContextCancellation --- PASS: TestPingExecutor_GitHubIssue157 (50.05s) --- PASS: TestPingExecutor_GitHubIssue157/DeviceOfflineScenario (21.01s) --- PASS: TestPingExecutor_GitHubIssue157/DeviceOfflineWithMultiplePings (21.02s) --- PASS: TestPingExecutor_GitHubIssue157/DeviceOfflineFastTimeout (6.02s) --- PASS: TestPingExecutor_GitHubIssue157/DeviceOfflineContextCancellation (2.00s) PASSBackward Compatibility
Future Enhancements
Screenshots
Fixes #157