Proper way to upgrade external modules? #4309

Nov 12, 2024

ggyurchev
Nov 12, 2024

Hi! We have an ejabberd cluster running with a few external modules, installed with ejabberdctl module_install. We are actively developing and changing the modules. What would be a correct procedure of upgrading them on production?

I already have something in mind but the are a few things that are a bit unclear.

My starting point is the upgrade guide for ejabberd here, since the situation is similar to upgrading the ejabberd version: https://docs.ejabberd.im/admin/upgrade/

Questions to "Soft upgrade process":

S.1. Before running leave_cluster, we should stop all connections to that node, right? My gut tells me bad things will happen if clients continue to connect through that node.
S.2. Why is it necessary to leave the cluster? I guess it's related to mnesia tables but if there are no changes to the schemas, can we do without leaving the cluster?
S.3. I've noticed that after reinstalling the module, the old version continues to run sometimes. Is this behavior expected?

Questions to "Module update process":

M.1. I tried installing the module on another ejabberd instance and replace the beam files on the destination server but I guess update_list and update don't seem to work for external modules. Is that assumption correct or there is something wrong?
M.2. If step 1 succeeds somehow and we run restart_module, is this action atomic or it's possible that the module will "miss" a few hooks during the restart? It is important for us that we do not miss anything from user_send_packet for example.

Answered by badlop

Nov 13, 2024

Questions to "Soft upgrade process":

That long procedure is not needed in your case, as you only modified a few erlang modules to fix bugs that don't require module restart to take effect right (no mnesia or ets changes, no changes to records/tuples/internal state). Instead of "soft upgrade procedure", you probably can use "module upgrade process".

S.1. Before running leave_cluster, we should stop all connections to that node, right? My gut tells me bad things will happen if clients continue to connect through that node.

You are right.

During the time (few seconds) that passes between step "run leave_cluster on node B" and "stop old node B", some clients may modify the node B databas…

View full answer

Nov 13, 2024

badlop
Nov 13, 2024
Maintainer

Questions to "Soft upgrade process":

That long procedure is not needed in your case, as you only modified a few erlang modules to fix bugs that don't require module restart to take effect right (no mnesia or ets changes, no changes to records/tuples/internal state). Instead of "soft upgrade procedure", you probably can use "module upgrade process".

S.1. Before running leave_cluster, we should stop all connections to that node, right? My gut tells me bad things will happen if clients continue to connect through that node.

You are right.

During the time (few seconds) that passes between step "run leave_cluster on node B" and "stop old node B", some clients may modify the node B database (modify roster, receive offline messages, etc).

And later the step "run join_cluster on node B, passing node A as parameter" is destructive: this step will delete the content of mnesia tables in node B, and then will copy the contents from node A.

Consequently, changes were done to node B database that are not synchronized with node A and are deleted.

There is a new command "evacuate_kindly" in git that allows you to kick clients and rooms, then you can run leave_cluster being sure that there are not clients around modifying the database.

S.2. Why is it necessary to leave the cluster? I guess it's related to mnesia tables but if there are no changes to the schemas, can we do without leaving the cluster?

Right.

Content of a mnesia table is synchronized with other nodes immediately.
But a mnesia table schema is created or synchronized when joining the cluster.

S.3. I've noticed that after reinstalling the module, the old version continues to run sometimes. Is this behavior expected?

This is undesirable, both when using method A and B described later. Maybe the code keeps running because there are processes still running based on it?

Questions to "Module update process":�

M.1. I tried installing the module on another ejabberd instance and replace the beam files on the destination server but I guess update_list and update don't seem to work for external modules. Is that assumption correct or there is something wrong?

You are right.

There are two methods to use new modules in ejabberd:

A) When ejabberd is compiled from source code, it is possible to copy more modules source code to src/ and let ejabberd compile them. In that case, the update procedure is:

Modify the module source code
Compile
Copy the beam file to overwrite the existing one in the ejabberd ebin path
Check that ejabberd detects it's ready to update

$ ./bin/ejabberdctl update_list
mod_vcard

Update the module beam into memory (does not stop+start the module, simply updates the beam into memory)

$ ./bin/ejabberdctl update mod_vcard

B) On the other hand, it is possible to add a new module to ejabberd-modules and install with "module_install". In that case, the update procedure is:

Modify the module source code (probably it's somewhere in $HOME/.ejabberd-modules/sources/...)
Upgrade: Compile the module source code, stop old version, uninstall old beam, install new beam, update in memory, start new module:

$ bin/ejabberdctl module_upgrade mod_shcommands

In your case, as you already have the module compiled (using B in a development machine), you can use method A in the production server, going directly to the steps Copy + Check + Update

M.2. If step 1 succeeds somehow and we run restart_module, is this action atomic or it's possible that the module will "miss" a few hooks during the restart? It is important for us that we do not miss anything from user_send_packet for example.

The B method restarts the module, that is stop+start, that is remove hooks and add them a few moments later, so there are a few moments that the hook of that module was not registered in ejabberd, and consequently it is not called.

The A method acts at a lower level: it tells erlang to modify the binary code of the module, so the functions get replaced (I imagine that atomically). In your case that's what you want.

0 replies

badlop · Nov 14, 2024

ggyurchev
Nov 14, 2024
Author

Thank you so much! This is really detailed and helpful!

2 replies

badlop Nov 14, 2024
Maintainer

Check S.1, now it mentions the new command "evacuate_kindly".

ggyurchev Nov 14, 2024
Author

The new command is indeed very helpful! Saves a lot of headaches managing the load balancer during upgrades. Thanks again!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proper way to upgrade external modules? #4309

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments · 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Search code, repositories, users, issues, pull requests...

Proper way to upgrade external modules? #4309

Uh oh!

Uh oh!

ggyurchev Nov 12, 2024

Questions to "Soft upgrade process":

Questions to "Module update process":

Replies: 2 comments · 2 replies

Uh oh!

Uh oh!

badlop Nov 13, 2024 Maintainer

Uh oh!

ggyurchev Nov 14, 2024 Author

Uh oh!

badlop Nov 14, 2024 Maintainer

Uh oh!

ggyurchev Nov 14, 2024 Author

ggyurchev
Nov 12, 2024

badlop
Nov 13, 2024
Maintainer

ggyurchev
Nov 14, 2024
Author

badlop Nov 14, 2024
Maintainer

ggyurchev Nov 14, 2024
Author