improvement: systemvm slow operations#1604
improvement: systemvm slow operations#1604romain-dartigues wants to merge 1 commit intoapache:4.9apache/cloudstack:4.9from romain-dartigues:systemvm-none-gatewayromain-dartigues/cloudstack:systemvm-none-gatewayCopy head branch name to clipboard
Conversation
| dp['gateway'] = self.qFile.data['cmd_line']['localgw'] | ||
| else: | ||
| dp['gateway'] = 'None' | ||
| dp['gateway'] = self.qFile.data['cmd_line'].get('gateway', 'None') |
There was a problem hiding this comment.
there could be a possibility of having an ipv6 gateway, in that case, we would be looking at ip6gateway.
|
@romain-dartigues do we know why the |
Right, but IPv6 does not use ARP but NDP (RFC 4861). It seems that a
Apparently because the file "/etc/cloudstack/ips.json" is generated with a gateway at "None" even when the gateway is present in the input "cmd_line.json" (but my memory could fail me at this time of the day). One month ago we found the "problem" and wanted a fix; trying to understand why it did happen made me start #1575 (now #1603). I though it would be good (at least for me) to get a better understanding of the code; doing hundreds of arping everytime is probably inefficient as each of them is supposed to refresh the ARP table of all equipment on the way... |
|
Thanks @romain-dartigues based on your comments the code LGTM |
|
we do have to remember that arping is ipv4 only so we need to fix this for ipv6 too |
|
@blueorangutan package |
|
@rhtyd a Trillian-Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✔centos6 ✔centos7 ✖debian repo: http://packages.shapeblue.com/cloudstack/pr/1604 |
| cmd = "arping -c 1 -I %s -A -U -s %s %s" % ( | ||
| self.dev, self.address['public_ip'], self.address['gateway']) | ||
| CsHelper.execute(cmd) | ||
| CsHelper.execute(cmd, wait=False) |
There was a problem hiding this comment.
Flag arguments are an anti-pattern. Please consider introducing and executeAndWait function instead.
|
@romain-dartigues can you fix the merge conflict, squash the changes, open a JIRA issue referencing that in the commit message. |
|
@rhtyd < JIRA issue #9568 Rebase done, change merge request target done. |
fce10c5 to
9c47e97
Compare
|
@murali-reddy can you help review this one? |
21f4e6d to
67bfc7c
Compare
|
@romain-dartigues @murali-reddy ping @blueorangutan package |
|
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✔centos6 ✔centos7 ✔debian. JID-208 |
67bfc7c to
ebe6537
Compare
|
@rhtyd pong (is it OK like that? please advise) |
|
Thanks @romain-dartigues that looks good. I'll kick a test round soon. |
|
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✔centos6 ✔centos7 ✔debian. JID-227 |
|
@blueorangutan test |
|
@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
|
Trillian test result (tid-379)
|
|
I'll need help to understand what gone wrong. I just peeked at But |
|
@romain-dartigues I'll have a look at the failure, meanwhile can you see my comments against the changes https://github.com/apache/cloudstack/pull/1604/files |
ebe6537 to
a0d0c8d
Compare
|
@rhtyd < separate function done. |
|
okay @romain-dartigues I'll kick some tests |
|
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✖centos6 ✖centos7 ✖debian. JID-238 |
|
@blueorangutan package |
|
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✔centos6 ✔centos7 ✔debian. JID-243 |
|
@murali-reddy @abhinandanprateek can you help review this, thanks |
|
@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
|
Trillian test result (tid-425)
|
|
@romain-dartigues we fixed issue where ips.json has wrong gateway due to error in parsing the cmd_line.json. its committed in 4.8 and above [1]. Do you still see (or can you please verify) performance hit due to arpping to wrong gateway on 4.8/4.9/master? [1] 93ac134 |
|
@murali-reddy I'm not in position to run tests anymore |
|
Several test failures, unless the PR author can fix, this is on hold. |
|
@romain-dartigues @dpassante any update on this? |
|
No comment, since it does not seems required anymore I close this request. |
On our setup we had issues with vrouters when a large number of VM were created at once.
The vrouters try to arping the gateway (to refresh the ARP cache I suppose), but in some case it take too much time and the vrouters got killed for being unresponsive.
The following patches attempt to solve this by:
This patch has been running for a month on our test environment and seems to solve the initial problem.
An operation (starting 220 VM on XenServer, shared storage) which took 46 mn, now take 4 mn (without errors).