Reduce overhead of indy plumbing #8748

headius · Apr 4, 2025

JRuby 10 enables invokedynamic for all optimization by default. Unfortunately, some of the plumbing around invokedynamic use has never seen much in the way of profiling and optimization. This can impact early boot times for applications and early execution performance and warmup.

This PR will be a pass over the key parts of indy infrastructure with a goal of "harm reduction" when boot-time and early runtime common cases hit this plumbing heavily.

Optimizations included:

Reduce overhead from hierarchy-based invalidation by right-sizing lists and avoiding re-invalidation of unused SwitchPoints.

Profiling repeated high-ancestor method definitions revealed much slower performance for all that invalidation when using indy and SwitchPoint, around 4-6x slower than the simple generation-based invalidator. Much of the overhead is in building the invalidator list, SwitchPoint array, and invalidating those SwitchPoints over and over. This patch makes the following improvements: * Guess at the right size for invalidator list based on previous invalidation events. We use last size * 1.25 to give room for a bit of growth, because reallocation along this path was the top alloc in benchmark. * Only add SwitchPointInvalidator to list if it has an actively- used SwitchPoint. When invalidated, we do not immediately create a new SwitchPoint, instead replacing the just-invalidated SP with an invalid dummy SP. Only when the SP is directly requested do we populate it with a live SP. Dummy switchpoints do not need to be re-invalidated, so we avoid adding to the list. * Avoid allocating zero-length SwitchPoint lists when no SP are in use. * Avoid allocating iterators along non-SP invalidation paths. * Tweaks to the dummy logic to actully use the dummy whenever we invalidate or prepare for invalidation. Given the following benchmark: ```ruby require 'benchmark' loop { puts Benchmark.measure { 10000.times { module Kernel def foo1 = nil def foo2 = nil def foo3 = nil def foo4 = nil def foo5 = nil end } } } ``` Performance goes from: ``` 7.450000 0.140000 7.590000 ( 6.643895) 6.590000 0.060000 6.650000 ( 6.532706) 6.570000 0.070000 6.640000 ( 6.555341) ``` to: ``` 1.510000 0.040000 1.550000 ( 1.165226) 1.140000 0.010000 1.150000 ( 1.074266) 1.050000 0.000000 1.050000 ( 1.034040) ``` Compared with non-indy: ``` 1.840000 0.050000 1.890000 ( 1.138749) 1.220000 0.000000 1.220000 ( 1.064950) 1.020000 0.010000 1.030000 ( 1.013060) ``` Note that this optimization is most effective when few call sites are being populated, such as early in boot when most invalidation takes place. Heavy root invalidations at runtime while call sites are active throughout the subhierarchy will still suffer from SP churn and excess allocation of supporting structures. A move to aggregate SP invalidation at the call site (gather all parent SP as call site guard) will push the cost into initial and re-binds of those sites, which can be expected to either stabilize or give up eventually and use simple caching.

headius added this to the JRuby 10.0.0.0 milestone Apr 4, 2025

headius force-pushed the indy_cost_reduction branch from a253915 to 513f869 Compare April 4, 2025 14:31

headius force-pushed the indy_cost_reduction branch from 513f869 to 0663077 Compare April 7, 2025 21:13

headius merged commit e2909f9 into jruby:master Apr 8, 2025
72 checks passed

headius deleted the indy_cost_reduction branch April 8, 2025 22:33

headius mentioned this pull request Apr 8, 2025

Reduce overhead of invokedynamic binding #8756

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Reduce overhead of indy plumbing #8748

Reduce overhead of indy plumbing #8748

Uh oh!

headius commented Apr 4, 2025

Uh oh!

Uh oh!

Uh oh!

Search code, repositories, users, issues, pull requests...

Uh oh!

Reduce overhead of indy plumbing #8748

Reduce overhead of indy plumbing #8748

Uh oh!

Conversation

headius commented Apr 4, 2025

Uh oh!

Uh oh!

Uh oh!