Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

@tafaust
Copy link
Contributor

@tafaust tafaust commented Oct 13, 2025

Problem

Ping monitors don't report devices as down in a timely manner when they go offline. The monitor continues showing "up" status for minutes instead of immediately detecting the failure within the configured timeout period (default 20 seconds with 1 retry).

Issue Source: GitHub Issue #157 - User reported Uptime Kuma handles this correctly while Peekaping does not.

Root Causes

1. Context Ignored in Native ICMP Implementation

  • The ctx parameter was passed but never used
  • DNS resolution used blocking net.ResolveIPAddr (non-context-aware)
  • ICMP socket operations didn't check for context cancellation
  • When a device goes offline, operations would hang indefinitely

2. Missing Advanced Timeout Controls

Uptime Kuma uses a proven dual timeout system:

  • Per-request timeout (2s default) + global deadline (10s default)
  • Multiple ping attempts per check (1 default)
  • Proper timeout hierarchy: global timeout ≥ per-request timeout

Solution

Phase 1: Critical Context Fix ✅

  • Context-aware DNS resolution: Replaced net.ResolveIPAddr with net.DefaultResolver.LookupIPAddr(ctx, host)
  • Context cancellation checks: Added select statements to monitor ctx.Done() before ICMP operations
  • Context-aware ICMP read: Implemented goroutine-based read with parallel context monitoring
  • Result: Monitor now detects offline devices within timeout period instead of hanging indefinitely

Phase 2: Enhanced Timeout System ✅

  • Multi-ping support: Added count field (1-100 pings, default 1)
  • Per-request timeouts: Added per_request_timeout field (1-60s, default 2s)
  • Validation: Ensures global timeout ≥ per_request_timeout × count
  • Backward compatibility: Default values for existing monitors
  • Uptime Kuma parity: Matches proven timeout behavior

Changes Made

Core Files Modified

  • apps/server/src/modules/healthcheck/executor/ping.go
    • Enhanced PingConfig struct with Count and PerRequestTimeout fields
    • Added timeout constants and validation
    • Implemented multi-ping logic with per-request timeouts
    • Made all operations context-aware

New Features

  • Multi-ping: Send multiple ICMP packets per check for better reliability
  • Per-request timeout: Individual timeout for each ping attempt
  • Global timeout validation: Prevents misconfiguration
  • Enhanced error handling: Better timeout detection and reporting

Testing

  • Comprehensive test suite: Added TestPingExecutor_GitHubIssue157 that reproduces the exact scenario
  • Timeout validation tests: Verify proper timeout enforcement
  • Context cancellation tests: Ensure context respect
  • Multi-ping tests: Validate enhanced functionality

Results

Before Fix

  • Ping monitors would hang indefinitely when devices went offline
  • No timeout respect, no context cancellation
  • User reported: "Monitor will still report device is up" after waiting longer than timeout

After Fix

  • DeviceOfflineScenario: Completes in ~21s (within 20s + buffer)
  • DeviceOfflineWithMultiplePings: Completes in ~21s (multi-ping works)
  • DeviceOfflineFastTimeout: Completes in ~6s (fast detection)
  • DeviceOfflineContextCancellation: Completes in ~2s (context respect)

Testing

All tests pass including the new GitHub Issue #157 reproduction tests:

=== RUN   TestPingExecutor_GitHubIssue157
=== RUN   TestPingExecutor_GitHubIssue157/DeviceOfflineScenario
=== RUN   TestPingExecutor_GitHubIssue157/DeviceOfflineWithMultiplePings
=== RUN   TestPingExecutor_GitHubIssue157/DeviceOfflineFastTimeout
=== RUN   TestPingExecutor_GitHubIssue157/DeviceOfflineContextCancellation
--- PASS: TestPingExecutor_GitHubIssue157 (50.05s)
    --- PASS: TestPingExecutor_GitHubIssue157/DeviceOfflineScenario (21.01s)
    --- PASS: TestPingExecutor_GitHubIssue157/DeviceOfflineWithMultiplePings (21.02s)
    --- PASS: TestPingExecutor_GitHubIssue157/DeviceOfflineFastTimeout (6.02s)
    --- PASS: TestPingExecutor_GitHubIssue157/DeviceOfflineContextCancellation (2.00s)
PASS

Backward Compatibility

  • Existing ping monitors continue to work with default values
  • New fields are optional with sensible defaults
  • No database migration required for basic functionality
  • Enhanced features available when explicitly configured

Future Enhancements

  • UI updates to expose new timeout controls
  • Database migration for persistent configuration
  • Advanced monitoring metrics (RTT statistics, packet loss)

Screenshots

image

Fixes #157

@codecov
Copy link

codecov bot commented Oct 13, 2025

Codecov Report

❌ Patch coverage is 31.03448% with 80 lines in your changes missing coverage. Please review.
✅ Project coverage is 20.28%. Comparing base (4b3adfa) to head (05fd862).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
...ps/server/src/modules/healthcheck/executor/ping.go 31.03% 78 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #212      +/-   ##
==========================================
+ Coverage   19.90%   20.28%   +0.37%     
==========================================
  Files         181      181              
  Lines       18829    18900      +71     
==========================================
+ Hits         3748     3833      +85     
+ Misses      14897    14878      -19     
- Partials      184      189       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Owner

@0xfurai 0xfurai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, @tafaust — huge thanks for stepping up and fixing this one! 🙌
This issue had been sitting for a while and no one wanted to tackle it, but you did — and that really means a lot.
Appreciate your initiative and contribution to the project 💪

apps/server/src/modules/healthcheck/executor/ping.go Outdated Show resolved Hide resolved
apps/server/src/modules/healthcheck/executor/ping.go Outdated Show resolved Hide resolved
}

// Validate per-request timeout range
if pingCfg.PerRequestTimeout < PING_PER_REQUEST_TIMEOUT_MIN || pingCfg.PerRequestTimeout > PING_PER_REQUEST_TIMEOUT_MAX {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels little weird to have bunch of different timeouts to setup from client. Can we handle everything with just global timeout? just add some time on top for evaluation, to simplify UI?

Copy link
Contributor Author

@tafaust tafaust Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I show calculated global timeout as read-only in the frontend?

For the two different pings:

  • Per-request timeout: Controls how long to wait for each packet (network responsiveness)
  • Global timeout: Prevents infinite waits (safety net)

Copy link
Contributor Author

@tafaust tafaust Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a3615c8 (not tested yet) which should comply with a more simplified UI and some reasonable safe guard in the backend.

…cancellation

- Replace net.ResolveIPAddr with context-aware net.DefaultResolver.LookupIPAddr
- Add IPv4 address filtering for DNS resolution results
- Implement context cancellation checks before sending ICMP packets
- Refactor ICMP reply reading with goroutine and select for timeout handling
- Improve error handling for address resolution and message parsing
- Add proper timeout support using context cancellation
- Introduce new configuration fields for ping count and per-request timeout in PingConfig.
- Implement validation for count and timeout ranges to ensure proper configuration.
- Update the native ping implementation to support multiple pings with customizable timeout settings.
- Enhance logging to provide detailed feedback on ping attempts and results.
- Introduce a new test file for PingExecutor, covering various scenarios including timeout validation, context cancellation, and default value handling.
- Implement tests for multi-ping configurations and ensure proper responses for valid and invalid configurations.
- Address specific edge cases, including context cancellation and device offline scenarios, to enhance reliability and robustness of the ping monitoring feature.
…ation

- Introduce new fields for count and per-request timeout in the PingConfig interface.
- Implement validation for these fields in the ping schema to ensure proper configuration.
- Update the default values and serialization/deserialization logic to accommodate the new fields.
- Enhance the UI to include input fields for count and per-request timeout with appropriate labels and descriptions in the localization files.
…files for ping settings

- Introduced new localization entries for per-request timeout and count in multiple languages.
- Updated existing ping configuration translations to include descriptions for the new fields.
- Ensured consistency across all localization files to enhance user experience in configuring ping settings.
- Added default values for PacketSize, Count, and PerRequestTimeout fields in the PingConfig struct using struct tags.
- Introduced applyDefaults function to set default values for zero-value fields during configuration.
- Removed manual default value checks from the Validate and Execute methods, streamlining the code and enhancing maintainability.
- Implemented CalculateGlobalTimeout method in PingExecutor to compute the global timeout based on ping configuration, capping it at 60 seconds.
- Updated Execute method to utilize the calculated global timeout for ping operations.
- Enhanced the UI to display the calculated global timeout in the PingForm, including a description for better user understanding.
@tafaust tafaust force-pushed the bug/157-ping-monitor-detection-delay branch from b6daec3 to a3615c8 Compare October 14, 2025 06:30
- Deleted the CalculateGlobalTimeout method to streamline the PingExecutor functionality.
- This change simplifies the codebase by removing unused timeout calculation logic, as it is now handled elsewhere.
- Renamed test case to reflect the automatic calculation of global timeout.
- Adjusted assertions to validate the expected behavior for a valid host, ensuring the global timeout is calculated as 3 * per_request_timeout + 2, capped at 60 seconds.
- Removed outdated comments and improved clarity in test descriptions.
@0xfurai
Copy link
Owner

0xfurai commented Oct 16, 2025

Not sure how better do timeout thing but something is off here. We do have parental control on the context cancellation at the moment in

callCtx, cCancel := context.WithTimeout(

maybe it was not the best solution and we need to pass timeout control to executor. anyway we need to make sure we don't get over the interval limit

so code evaluation of this solution will take some time.

@tafaust
Copy link
Contributor Author

tafaust commented Oct 18, 2025

@0xfurai what does that mean?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ping monitor doesn't report it is down for minutes

2 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.