Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
Discussion options

Hi! We have an ejabberd cluster running with a few external modules, installed with ejabberdctl module_install. We are actively developing and changing the modules. What would be a correct procedure of upgrading them on production?

I already have something in mind but the are a few things that are a bit unclear.

My starting point is the upgrade guide for ejabberd here, since the situation is similar to upgrading the ejabberd version: https://docs.ejabberd.im/admin/upgrade/

Questions to "Soft upgrade process":

S.1. Before running leave_cluster, we should stop all connections to that node, right? My gut tells me bad things will happen if clients continue to connect through that node.
S.2. Why is it necessary to leave the cluster? I guess it's related to mnesia tables but if there are no changes to the schemas, can we do without leaving the cluster?
S.3. I've noticed that after reinstalling the module, the old version continues to run sometimes. Is this behavior expected?

Questions to "Module update process":

M.1. I tried installing the module on another ejabberd instance and replace the beam files on the destination server but I guess update_list and update don't seem to work for external modules. Is that assumption correct or there is something wrong?
M.2. If step 1 succeeds somehow and we run restart_module, is this action atomic or it's possible that the module will "miss" a few hooks during the restart? It is important for us that we do not miss anything from user_send_packet for example.

You must be logged in to vote

Questions to "Soft upgrade process":

That long procedure is not needed in your case, as you only modified a few erlang modules to fix bugs that don't require module restart to take effect right (no mnesia or ets changes, no changes to records/tuples/internal state). Instead of "soft upgrade procedure", you probably can use "module upgrade process".


S.1. Before running leave_cluster, we should stop all connections to that node, right? My gut tells me bad things will happen if clients continue to connect through that node.

You are right.

During the time (few seconds) that passes between step "run leave_cluster on node B" and "stop old node B", some clients may modify the node B databas…

Replies: 2 comments · 2 replies

Comment options

Questions to "Soft upgrade process":

That long procedure is not needed in your case, as you only modified a few erlang modules to fix bugs that don't require module restart to take effect right (no mnesia or ets changes, no changes to records/tuples/internal state). Instead of "soft upgrade procedure", you probably can use "module upgrade process".


S.1. Before running leave_cluster, we should stop all connections to that node, right? My gut tells me bad things will happen if clients continue to connect through that node.

You are right.

During the time (few seconds) that passes between step "run leave_cluster on node B" and "stop old node B", some clients may modify the node B database (modify roster, receive offline messages, etc).

And later the step "run join_cluster on node B, passing node A as parameter" is destructive: this step will delete the content of mnesia tables in node B, and then will copy the contents from node A.

Consequently, changes were done to node B database that are not synchronized with node A and are deleted.

There is a new command "evacuate_kindly" in git that allows you to kick clients and rooms, then you can run leave_cluster being sure that there are not clients around modifying the database.


S.2. Why is it necessary to leave the cluster? I guess it's related to mnesia tables but if there are no changes to the schemas, can we do without leaving the cluster?

Right.

Content of a mnesia table is synchronized with other nodes immediately.
But a mnesia table schema is created or synchronized when joining the cluster.


S.3. I've noticed that after reinstalling the module, the old version continues to run sometimes. Is this behavior expected?

This is undesirable, both when using method A and B described later. Maybe the code keeps running because there are processes still running based on it?


Questions to "Module update process":�

M.1. I tried installing the module on another ejabberd instance and replace the beam files on the destination server but I guess update_list and update don't seem to work for external modules. Is that assumption correct or there is something wrong?

You are right.

There are two methods to use new modules in ejabberd:

A) When ejabberd is compiled from source code, it is possible to copy more modules source code to src/ and let ejabberd compile them. In that case, the update procedure is:

  • Modify the module source code
  • Compile
  • Copy the beam file to overwrite the existing one in the ejabberd ebin path
  • Check that ejabberd detects it's ready to update
$ ./bin/ejabberdctl update_list
mod_vcard
  • Update the module beam into memory (does not stop+start the module, simply updates the beam into memory)
$ ./bin/ejabberdctl update mod_vcard

B) On the other hand, it is possible to add a new module to ejabberd-modules and install with "module_install". In that case, the update procedure is:

  • Modify the module source code (probably it's somewhere in $HOME/.ejabberd-modules/sources/...)
  • Upgrade: Compile the module source code, stop old version, uninstall old beam, install new beam, update in memory, start new module:
$ bin/ejabberdctl module_upgrade mod_shcommands

In your case, as you already have the module compiled (using B in a development machine), you can use method A in the production server, going directly to the steps Copy + Check + Update


M.2. If step 1 succeeds somehow and we run restart_module, is this action atomic or it's possible that the module will "miss" a few hooks during the restart? It is important for us that we do not miss anything from user_send_packet for example.

The B method restarts the module, that is stop+start, that is remove hooks and add them a few moments later, so there are a few moments that the hook of that module was not registered in ejabberd, and consequently it is not called.

The A method acts at a lower level: it tells erlang to modify the binary code of the module, so the functions get replaced (I imagine that atomically). In your case that's what you want.

You must be logged in to vote
0 replies
Answer selected by ggyurchev
Comment options

Thank you so much! This is really detailed and helpful!

You must be logged in to vote
2 replies
@badlop
Comment options

Check S.1, now it mentions the new command "evacuate_kindly".

@ggyurchev
Comment options

The new command is indeed very helpful! Saves a lot of headaches managing the load balancer during upgrades. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.