-
-
Notifications
You must be signed in to change notification settings - Fork 2k
fix: Better support container restarts #4323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLDR is a checklist of suggested changes at the end of one of the review comments.
Context for those change suggestions is covered by the rest of the feedback 👍
7245558
to
8bd7e3d
Compare
Alrighty - tests are currently running locally, but I think I finished the analysis and implementation for this one. Thank you for the feedback @casperklein and @polarathene - as always. I have split everything into digestible commits, please review commit by commit. @polarathene In your #4323 (comment), I have ticked what I implemented, and I explained what I had to do differently with comments in my commits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of change requests from review feedback. Not everything needs to be addressed, but the env export needs to be resolved either as described below or by reverting that change (it doesn't look like ENV can be adjusted for restarts, that results in a fresh container AFAIK, thus may not be a bug to restrict it to _setup()
)
-
__environment_variables_export
is a regression now. To prevent that you'll need to:- Migrate
_check_log_level()
fromcheck-stack.sh
intovariables-stack.sh
- Migrate the feature flag co-located ldap, oauth, saslauthd
variable-stack.sh
method calls from_register_functions
(start-mailserver.sh
) to a new method invariable-stack.sh
with equivalent feature flag ENV checks. - Have the two methods (log-level check and feature-flag) called prior to the export method.
- NOTE: This would make the
_early_variables_setup
name a bit redundant, but I'm fine leaving that as-is for now in this PR so the diff is easier to grok / follow 😅
- NOTE: This would make the
- Migrate
-
_setup_adjust_state_permissions
and_setup_directory_and_file_permissions
are both suitable to be managed during_check
, but I won't enforce this change request for the PR.- NOTE:
_setup_adjust_state_permissions
is partially dependent upon_setup_save_states
on a fresh container start should the volume not already provide the service dir for a feature. Running_setup_save_states
in_check
is also valid AFAIK, but the split is ofmail_state.sh
into two methods is still good to have.
- NOTE:
-
_setup_adjust_state_permissions()
should inverse the debug log condition, as suggested. I don't see much value in debug log for notifying the user of skipping the logic, as per review feedback.
From prior feedback:
- Optionally consider dropping
_early_supervisor_setup()
fromstart-mailserver.sh
and moving that method fromsetup-stack.sh
todaemons-stack.sh
where_start_daemons()
can call it before the for loop.
_register_setup_function '_setup_save_states' | ||
_register_setup_function '_setup_apply_fixes_after_configuration' | ||
_register_setup_function '_environment_variables_export' | ||
_register_setup_function '_setup_adjust_state_permissions' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this, but you could also have _setup_save_states()
internally call this follow-up method directly at the end of it's function since that is the intention.
EDIT: Per other review feedback, if _setup_save_states()
isn't actually a required dependency, better to call separately as you're doing here.
I think it's a dependency in the sense of the operations being called on paths that are expected to exist when the feature flags are enabled, in which case the first time enabled _setup_save_states()
should have run first.
I just tested with compose and adjusting an ENV value (hard-coded or via the external variable interpolation) of environment
or adding a new ENV cleared the internal state, same with env_file
. So it shouldn't be possible to restart with ENV adjusted, it'd go through the fresh container setup.
Per other review feedback, it does seem like _setup_save_states()
could still run during a restart, thus keeping the ownership/permissions fix called internally at the end of _setup_save_states()
seems viable, along with moving it into _check()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure what to make out of this now. Are you satisfied with the way it is or should I change something?
# Ensure DMS only adjusts config files for a new container. | ||
# Container restarts should skip as they retain the modified config. | ||
if [[ -f /CONTAINER_START ]]; then | ||
_log 'info' 'Container was restarted. Skipping most setup routines.' | ||
# We cannot skip all setup routines because some need to run _after_ | ||
# the initial setup (and hence, they cannot be moved to the check stack). | ||
_setup_directory_and_file_permissions | ||
_setup_adjust_state_permissions | ||
else | ||
_setup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_setup_adjust_state_permissions
AFAIK only has a dependency upon the feature flags ENV being set (and now a global var), thus it's possible to group under _check
/ check-stack.sh
.
_setup_directory_and_file_permissions
likewise was suggested to be migrated to check-stack.sh
, it's dependent upon ENV with logic that should be fine before _setup
. It's mostly a collection of misc checks/fixes, there's _chown_var_mail_if_necessary()
, but AFAIK we shouldn't need that post _setup
during first container start as Dovecot + setup
CLI should create /var/mail
folders with correct permissions, this has always been a restart related fix I think.
Thus moving these to _check
should resolve the concern. For mail_state.sh
a fresh container would however output a bit of a misleading debug log for _setup_adjust_state_permissions()
, since it ran before _setup_save_states()
.. so technically that could be run after this conditional block, but that has the caveat that user-patches.sh
should also be run after that (but we don't want to run that on a restart.. 🫤)
In my prior review the concern was if _setup_save_states()
could potentially do something undesirable on a restart, but we were running that for many releases prior to the introduction of better container restart support via /CONTAINER_START
. So presumably it could join other preparation methods in _check
to keep everything simple.
The ownership/permission fix that follows is mostly for upgrades or other scenarios which may have fumbled the ownership/permissions AFAIK, so I don't think it actually has a hard dependency to run after _setup_save_states()
? If so we could probably point that out via comment above the function name in mail_state.sh
so maintainers don't have to think about it 😝
Is the debug log for _setup_adjust_state_permissions()
the right approach to output when the directory is missing during a restart? (I'm not sure how common that event would be for a container restart)
For a fresh container start, it seems irrelevant to log when the prior _setup_save_states()
emitted a very similar log for that condition. Perhaps it should be a silent return that only emits a debug log when it's actually met the condition to modify the state dir? (_setup_save_states()
is always logging for a fresh container, so I think that's sufficient)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried putting _setup_directory_and_file_permissions
into _check_
..., but this resulted in failing tests with /var/mail/...
having wrong permissions. Due to a lack of time, I did not investigate further but assumed something during setup touched the directories. Hence, it needs to run after all other setup functions. Consequently, I was unable to move it into the check stack.
Please give it a try if you think I made a mistake elsewhere (entirely possible, really), but I think we cannot move it yet without more significant adjustments, which I'm reluctant towards considering we want to release a new major.
Concerning the log if _setup_adjust_state_permissions
, I'll change that.
if [[ ! -d ${DMS_STATE_DIR:?DMS_STATE_DIR is not set} ]]; then | ||
_log 'debug' "'${DMS_STATE_DIR}' is not present - not adjusting state permissions" | ||
return 0 | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per related review feedback, flipping the debug log here makes more sense to only emit when changes will occur, the lack of the directory should only be relevant to a fresh container startup, in which case the prior dependency upon _setup_save_states()
has us covered in equivalent skipping log.
if [[ ! -d ${DMS_STATE_DIR:?DMS_STATE_DIR is not set} ]]; then | |
_log 'debug' "'${DMS_STATE_DIR}' is not present - not adjusting state permissions" | |
return 0 | |
fi | |
if [[ ! -d ${DMS_STATE_DIR:?DMS_STATE_DIR is not set} ]]; then | |
return 0 | |
fi | |
_log 'debug' "Ensuring correct ownership + permissions for DMS state dir: '${DMS_STATE_DIR}'" |
EDIT: I'm not sure if there's an actual dependency to _setup_save_states()
? I haven't verified if that method correctly syncs over ownership + permissions. I believe that's handled by the mv
command, which should preserve everything from the initial internal copy (probably fails if a restart can toggle a feature off and then back on again due to the (containers cannot be started with internal state persisted if ENV is changed from what I've observed).rm
logic of the other conditional though)
Still, I'm not sure how useful a debug log when the state dir is missing is helpful. It'd only seem relevant to a container restart and for troubleshooting we generally advise ensuring a fresh container is run which would provide this same context via _setup_save_states()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed the log as per e549b73
Signed-off-by: georglauterbach <44545919+georglauterbach@users.noreply.github.com>
Signed-off-by: georglauterbach <44545919+georglauterbach@users.noreply.github.com>
Signed-off-by: georglauterbach <44545919+georglauterbach@users.noreply.github.com>
Signed-off-by: georglauterbach <44545919+georglauterbach@users.noreply.github.com>
Signed-off-by: georglauterbach <44545919+georglauterbach@users.noreply.github.com>
Signed-off-by: georglauterbach <44545919+georglauterbach@users.noreply.github.com>
Signed-off-by: georglauterbach <44545919+georglauterbach@users.noreply.github.com>
Signed-off-by: georglauterbach <44545919+georglauterbach@users.noreply.github.com>
Signed-off-by: georglauterbach <44545919+georglauterbach@users.noreply.github.com>
Signed-off-by: georglauterbach <44545919+georglauterbach@users.noreply.github.com>
Thanks to @polarathene for his feedback in #4323 (review) Signed-off-by: georglauterbach <44545919+georglauterbach@users.noreply.github.com>
Signed-off-by: georglauterbach <44545919+georglauterbach@users.noreply.github.com>
Signed-off-by: georglauterbach <44545919+georglauterbach@users.noreply.github.com>
Done
I raised my concerns and problems with this in #4323 (comment).
Done (I think)
I do not think this would be a viable approach, as the script calls New commits begin here: 70eedac |
69b5350
to
e549b73
Compare
Bumps [anchore/scan-action](https://github.com/anchore/scan-action) from 6.0.0 to 6.1.0. - [Release notes](https://github.com/anchore/scan-action/releases) - [Changelog](https://github.com/anchore/scan-action/blob/main/RELEASE.md) - [Commits](anchore/scan-action@v6.0.0...v6.1.0) --- updated-dependencies: - dependency-name: anchore/scan-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 6.12.0 to 6.13.0. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](docker/build-push-action@v6.12.0...v6.13.0) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Georg Lauterbach <44545919+georglauterbach@users.noreply.github.com>
Signed-off-by: georglauterbach <44545919+georglauterbach@users.noreply.github.com>
I applied the changes and appeased the linting gods. |
Some update from my side: This went from a small change into something bigger + a lot of discussion which I haven't fully read yet. To make a proper review, I'll have to take some time. At the moment, that's a bit challenging for me. I'd like to review asap, but unfortunately I can't give an ETA right now. |
I'm in no rush, I'll be working on the podman PR in the meantime 👍 PR Overview
If it helps here is a quick overview for the diff.
Primary change is adjusting the restart check for _early_supervisor_setup
_early_variables_setup
_log 'info' "Welcome to docker-mailserver ${DMS_RELEASE}"
_register_functions
_check
# Ensure DMS only adjusts config files for a new container.
# Container restarts should skip as they retain the modified config.
if [[ -f /CONTAINER_START ]]; then
_log 'info' 'Container was restarted. Skipping most setup routines.'
# We cannot skip all setup routines because some need to run _after_
# the initial setup (and hence, they cannot be moved to the check stack).
_setup_directory_and_file_permissions
_setup_adjust_state_permissions
else
_setup
fi
Additional changes:
|
Indeed, this has become bigger than I anticipated as well. I'd still like to merge ASAP TBH, because I'd really like publish v15.0.0. Here is my proposal: I'd merge this and prepare the v15 release. We'll pick 1.5 weeks for the freeze because of this change. Please let me know whether this is okay for you both. @polarathene thanks for the overview :) I added the |
I have resolved the merge conflicts with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll apply these and shift the changelog breaking items to more appropriate sections.
@georglauterbach Good structured commits 👍 The only nitpick I have is the "DMS_STATE_DIR" change:
I just think, that this makes things unnecessary complicated. Like @polarathene said, Edit: Side effect of |
I felt the same way but I didn't want to nitpick heavily on it. I understand the intention to minimize caveats from using Bash. It's one of the main friction points we have with maintaining the project when variables / methods are being pulled into scope from other scripts, something we shouldn't have to feel like we're walking on glass with. If we do explore migrating some scripts away from bash at some point, that's one of the main benefits. So far NuShell seems like a good compromise for that (a rust-based shell that provides optional type annotation, structured data, module imports, etc). I understand wanting to avoid Python and concerns with languages that'd involve compilation vs an interpreter for scripts, Nu might be a good candidate for such. We've had this discussion quite a bit so I won't drag it out again here 😅 I do agree that I'm not fond of the added complexity / abstraction to add preventative measures prematurely though. |
I will remove this part again after seeing your feedback :) |
Description
The DMS v14 release introduced improved support for container restarts, but had some regressions that this PR resolves.
Fixes #4108
Fixes #4150
ref #4302
Type of change
Checklist
docs/
)CHANGELOG.md