Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

polarathene
Copy link
Member

Description

The current example healthcheck ss --listening --tcp | grep -P 'LISTEN.+:smtp' || exit 1 can be simplified to nc -z localhost 25 (our tests use this via a helper to wait on Postfix being ready), which effectively does the same check (that a service is listening on the port), but without the extra output or need to set an exit status (both of which could be resolved with grep -q).

The nc command is a bit more terser and direct though. The port 25 can be alternatively substituted for the service port smtp, (nc -z localhost smtp) just like the ss output displays (resolved via grep smtp /etc/services), but I figured most would be more familiar with the port number itself.


This has a benefit of less noisy healthcheck logs, which currently looks like this every 30 secs:

$ docker inspect --format='{{json .State.Health}}' dms | jq

{
  "Status": "healthy",
  "FailingStreak": 0,
  "Log": [
    {
      "Start": "2025-06-01T07:16:35.79643154Z",
      "End": "2025-06-01T07:16:35.842821137Z",
      "ExitCode": 0,
      "Output": "LISTEN 0      100          0.0.0.0:smtp             0.0.0.0:*          \nLISTEN 0      100             [::]:smtp                [::]:*          \n"
    },
    {
      "Start": "2025-06-01T07:17:05.843928898Z",
      "End": "2025-06-01T07:17:05.891460075Z",
      "ExitCode": 0,
      "Output": "LISTEN 0      100          0.0.0.0:smtp             0.0.0.0:*          \nLISTEN 0      100             [::]:smtp                [::]:*          \n"
    },
    {
      "Start": "2025-06-01T07:17:35.892520906Z",
      "End": "2025-06-01T07:17:35.938991775Z",
      "ExitCode": 0,
      "Output": "LISTEN 0      100          0.0.0.0:smtp             0.0.0.0:*          \nLISTEN 0      100             [::]:smtp                [::]:*          \n"
    },
    {
      "Start": "2025-06-01T07:18:05.939841294Z",
      "End": "2025-06-01T07:18:06.003672308Z",
      "ExitCode": 0,
      "Output": "LISTEN 0      100          0.0.0.0:smtp             0.0.0.0:*          \nLISTEN 0      100             [::]:smtp                [::]:*          \n"
    },
    {
      "Start": "2025-06-01T07:18:36.004749936Z",
      "End": "2025-06-01T07:18:36.048725208Z",
      "ExitCode": 0,
      "Output": "LISTEN 0      100          0.0.0.0:smtp             0.0.0.0:*          \nLISTEN 0      100             [::]:smtp                [::]:*          \n"
    }
  ]
}

With nc instead:

$ docker inspect --format='{{json .State.Health}}' dms | jq

{
  "Status": "healthy",
  "FailingStreak": 0,
  "Log": [
    {
      "Start": "2025-06-01T07:20:00.464604128Z",
      "End": "2025-06-01T07:20:00.516787463Z",
      "ExitCode": 0,
      "Output": ""
    },
    {
      "Start": "2025-06-01T07:20:30.517462343Z",
      "End": "2025-06-01T07:20:30.578389769Z",
      "ExitCode": 0,
      "Output": ""
    },
    {
      "Start": "2025-06-01T07:21:00.579295302Z",
      "End": "2025-06-01T07:21:00.621559922Z",
      "ExitCode": 0,
      "Output": ""
    },
    {
      "Start": "2025-06-01T07:21:30.622531376Z",
      "End": "2025-06-01T07:21:30.67123444Z",
      "ExitCode": 0,
      "Output": ""
    }
  ]
}

Type of change

  • Improvement (non-breaking change that does improve existing functionality)

@polarathene polarathene added this to the v15.1.0 milestone Jun 1, 2025
@polarathene polarathene self-assigned this Jun 1, 2025
@polarathene polarathene added the kind/improvement Improve an existing feature, configuration file or the documentation label Jun 1, 2025
georglauterbach
georglauterbach previously approved these changes Jun 1, 2025
@casperklein
Copy link
Member

casperklein commented Jun 1, 2025

[..] or need to set an exit status

Curl uses a lot of different return codes. The healthcheck however expects just 0 or 1. || exit 1 converts return codes > 1 to 1.


I don't like the new approach, because it pollutes the mail.log every 30 seconds with something like:

Jun  1 12:45:32 mail postfix/postscreen[18492]: CONNECT from [127.0.0.1]:46020 to [127.0.0.1]:25
Jun  1 12:45:32 mail postfix/postscreen[18492]: ALLOWLISTED [127.0.0.1]:46020
Jun  1 12:45:32 mail postfix/smtpd[18493]: connect from localhost[127.0.0.1]
Jun  1 12:45:32 mail opendmarc[2397]: ignoring connection from localhost
Jun  1 12:45:32 mail postfix/smtpd[18493]: lost connection after CONNECT from localhost[127.0.0.1]
Jun  1 12:45:32 mail postfix/smtpd[18493]: disconnect from localhost[127.0.0.1] commands=0/0

This might also introduce problems with fail2ban. IIRC using mode ddos or agressive, it looks for "connect" / "disconnect" patterns.


Edit:

To achieve a cleaner output, we could use: ss --ipv4 --listening --tcp --numeric | grep -o '0.0.0.0:25', which only returns 0.0.0.0:25

@polarathene
Copy link
Member Author

Curl uses a lot of different return codes. The healthcheck however expects just 0 or 1. || exit 1 converts return codes > 1 to 1.

Where is curl involved here? Isn't the exit status coming from the grep command otherwise?

I don't like the new approach, because it pollutes the mail.log every 30 seconds with something lik

I was not aware of that and you raise very valid points, thanks for pointing that out! 🙏


To achieve a cleaner output, we could use: ss --ipv4 --listening --tcp --numeric | grep -o '0.0.0.0:25', which only returns 0.0.0.0:25

Is there value in that output being stored repeated in the healthcheck log?

Shouldn't we just use grep -q for 0/1 exit status? As per the grep docs:

Exit Status

Normally, the exit status is 0 if selected lines are found and 1 otherwise.
But the exit status is 2 if an error occurred, unless the -q or --quiet or --silent option is used and a selected line is found.

Copy link
Member Author

@polarathene polarathene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified this check adjusted to interval: 1s:

$ docker inspect --format='{{json .State.Health}}' dms | jq
{
  "Status": "healthy",
  "FailingStreak": 0,
  "Log": [
    {
      "Start": "2025-06-01T21:57:33.747428151Z",
      "End": "2025-06-01T21:57:33.827584046Z",
      "ExitCode": 1,
      "Output": ""
    },
    {
      "Start": "2025-06-01T21:57:34.828241261Z",
      "End": "2025-06-01T21:57:34.919379591Z",
      "ExitCode": 1,
      "Output": ""
    },
    {
      "Start": "2025-06-01T21:57:35.919784141Z",
      "End": "2025-06-01T21:57:35.965090221Z",
      "ExitCode": 0,
      "Output": ""
    },
    {
      "Start": "2025-06-01T21:57:36.965467119Z",
      "End": "2025-06-01T21:57:37.012927453Z",
      "ExitCode": 0,
      "Output": ""
    },
    {
      "Start": "2025-06-01T21:57:38.013727139Z",
      "End": "2025-06-01T21:57:38.059783409Z",
      "ExitCode": 0,
      "Output": ""
    }
  ]
}

NOTE: Only the last 5 checks are stored in the log output, hence the reduced interval to confirm.

compose.yaml Outdated Show resolved Hide resolved
test/tests/serial/tests.bats Outdated Show resolved Hide resolved
@casperklein
Copy link
Member

Where is curl involved here? Isn't the exit status coming from the grep command otherwise?

My bad. I was looking at another healthcheck example while writing. It was however just to explain the || exit 1 usage / best-practice.

Shouldn't we just use grep -q for 0/1 exit status?

That will work. Only disadvantage: when grep fails (return code 2) for whatever reason, it will happen silently.

compose.yaml Outdated
# - NET_ADMIN
healthcheck:
test: "ss --listening --tcp | grep -P 'LISTEN.+:smtp' || exit 1"
test: "ss --listening --tcp | grep --silent ':smtp'"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik our postfix listens only on IPv4. Therefore we could make it more explicit with:

Suggested change
test: "ss --listening --tcp | grep --silent ':smtp'"
test: "ss --listening --ipv4 --tcp | grep --silent ':smtp'"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should? If you had an IPv6 only container (I think this is possible in Docker now at least), the healthcheck could be agnostic to that.

DMS itself though does have a fair bit of 127.0.0.1 hard-coded I think instead of localhost, so it may not support IPv6 only out of the box 😅

If there is a good reason to specifically filter to IPv4 only though, we could do that. But I don't think there's any issues with also detecting listening on IPv6?

@polarathene
Copy link
Member Author

Shouldn't we just use grep -q for 0/1 exit status?

That will work. Only disadvantage: when grep fails (return code 2) for whatever reason, it will happen silently.

I assume the exit status of 2 becomes 1 in that case, but I'm not sure how to verify that.

If grep is given an invalid arg or invalid filepath to search you still get an exit status of 2, despite --quiet / --silent. Presumably a valid command can error and instead of 2 you'd get 1?

I suppose we could keep || exit 1 tacked on as a precaution? 🤷‍♂️


https://docs.docker.com/reference/dockerfile/#healthcheck

The command's exit status indicates the health status of the container.

The possible values are:

  • 0: success - the container is healthy and ready for use
  • 1: unhealthy - the container isn't working correctly
  • 2: reserved - don't use this exit code

Copy link
Member Author

@polarathene polarathene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding --ipv4 as suggested by @casperklein , should someone bring up IPv6 only DMS, we can discuss dropping it then 🤔

Adding back || exit 1 for broader compatibility. Some errors encountered by grep still return a 2 despite --silent/--quiet option being set.

compose.yaml Outdated Show resolved Hide resolved
test/tests/serial/tests.bats Outdated Show resolved Hide resolved
@polarathene polarathene enabled auto-merge (squash) June 2, 2025 07:27
@polarathene polarathene merged commit 3c193a1 into master Jun 2, 2025
2 checks passed
@polarathene polarathene deleted the chore/simplify-example-healthcheck branch June 2, 2025 07:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/improvement Improve an existing feature, configuration file or the documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.