-
-
Notifications
You must be signed in to change notification settings - Fork 929
refactor Vnode.normalizeChildren
#3052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Quick question: how is memory usage impacted? Sparse arrays take more memory than equivalent-length dense arrays, so I just want a quick sanity check here on its implications. Also, how does it compare to an explicit |
@dead-claudia Thank you for your review. I briefly verified the followings.
I logged
If the // (1) About the same performance as v2.3.7
var children = []
var numKeyed = 0
for (var i = 0; i < input.length; i++) {
children.push(Vnode.normalize(input[i]))
if (children[i] != null && children[i].key != null) numKeyed++
}
// (2) It is slightly faster than v2.3.7, but it is not as good as this PR.
var children = []
var numKeyed = 0
var normalized
for (var i = 0; i < input.length; i++) {
children.push(normalized = Vnode.normalize(input[i]))
if (normalized != null && normalized.key != null) numKeyed++
}
// (3) Almost the same performance as (2)
var children = []
var numKeyed = 0
for (var i = 0; i < input.length; i++) {
var normalized = Vnode.normalize(input[i])
children.push(normalized)
if (normalized != null && normalized.key != null) numKeyed++
} |
@kfule Spin up a local server from the repo root, open If you use Firefox, the process is similar, just with stuff in different locations. |
@dead-claudia For clarity, I assigned the following vnode to mithril.js/performance/test-perf.js Line 105 in 2a89704
// I assigned the vnode to `largeVnodes` like this
this.largeVnodes = m(".foo.bar[data-foo=bar]", {p: 2}, The screenshots of the results from the two branches (main (v2.3.7 mithril.js bundle) and this PR build bundle) are as follows. While they are indeed running different mithril.js scripts, they appear to show no differences. Additionally, there were These do appear to capture the vnode results, but since I'm unfamiliar with this, there might be other areas I should be looking at. Also, this PR might have an advantage since reallocation disappears with large arrays, though. |
A long time ago, V8 used to always allocate |
for (var i = 0; i < input.length; i++) { | ||
children[i] = Vnode.normalize(input[i]) | ||
if (children[i] != null && children[i].key != null) numKeyed++ | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this boost performance further?
for (var i = 0; i < input.length; i++) { | |
children[i] = Vnode.normalize(input[i]) | |
if (children[i] != null && children[i].key != null) numKeyed++ | |
} | |
for (var i = 0; i < input.length; i++) { | |
children[i] = Vnode.normalize(input[i]) | |
if (children[i] !== null) numKeyed += children[i].key != null | |
} |
And how does this compare against the above?
for (var i = 0; i < input.length; i++) { | |
children[i] = Vnode.normalize(input[i]) | |
if (children[i] != null && children[i].key != null) numKeyed++ | |
} | |
var i = 0 | |
for (var child of input) { | |
children[i] = child = Vnode.normalize(child) | |
if (child !== null) numKeyed += child.key != null | |
} |
Engines do weird things here, but the idea is to reduce branches.
- I've seen engines convert
num += bool
to branch-free but notif (bool) num++
, despite having the type info to do it. - V8 can elide bounds checks more easily with
for ... of
, and Node's seen measurable perf gains from switching. foo != null
is technically sugar forfoo !== undefined && foo !== null && !isHTMLDocumentAll(foo)
. Inline caches can reduce that, butfoo !== null
doesn't require an inline cache at all, just a branch predictor cache.- The
foo.key
requires a type cache no matter what. The JIT will figure out thatVnode.normalize
can only return vnodes andnull
, and so it may be able to skip theundefined
check entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dead-claudia
Thank you for your comment. I had overlooked the strict null checks, so I’ve pushed that version for now.
As for performance, the boolean-addition variant appears slightly slower than the current PR. However, note that the current benchmark doesn’t use keyed vnodes, so that might affect fairness.
The final for-of
version appears slower than both of those.
In any case, I plan to summarize and share the comparison results with the boolean-addition version later. Since the difference seems small, I’ll try measuring it again on a non-laptop machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are the results from running on a Mac mini (M2) with Node v22. The !==
variant (current PR) appears to be slightly better.
# !== null
construct large vnode tree x 85,558 ops/sec ±0.13% (135 runs sampled)
rerender identical vnode x 16,952,524 ops/sec ±0.07% (138 runs sampled)
rerender same tree x 336,489 ops/sec ±0.16% (135 runs sampled)
rerender same tree (with selector-generated attrs) x 432,880 ops/sec ±0.10% (134 runs sampled)
add large nested tree x 31,277 ops/sec ±0.13% (136 runs sampled)
mutate styles/properties x 647 ops/sec ±1.83% (126 runs sampled)
repeated add/removal x 12,115 ops/sec ±2.49% (122 runs sampled)
Completed perf tests in 58038ms
# != null
construct large vnode tree x 85,002 ops/sec ±0.12% (133 runs sampled)
rerender identical vnode x 16,645,591 ops/sec ±0.12% (137 runs sampled)
rerender same tree x 335,902 ops/sec ±0.18% (136 runs sampled)
rerender same tree (with selector-generated attrs) x 423,986 ops/sec ±0.12% (135 runs sampled)
add large nested tree x 30,986 ops/sec ±0.08% (136 runs sampled)
mutate styles/properties x 647 ops/sec ±1.65% (126 runs sampled)
repeated add/removal x 12,234 ops/sec ±2.36% (123 runs sampled)
Completed perf tests in 58207ms
# += bool
construct large vnode tree x 84,533 ops/sec ±0.12% (136 runs sampled)
rerender identical vnode x 16,589,058 ops/sec ±0.09% (138 runs sampled)
rerender same tree x 332,196 ops/sec ±0.19% (135 runs sampled)
rerender same tree (with selector-generated attrs) x 428,660 ops/sec ±0.13% (132 runs sampled)
add large nested tree x 30,983 ops/sec ±0.12% (133 runs sampled)
mutate styles/properties x 639 ops/sec ±1.87% (127 runs sampled)
repeated add/removal x 12,257 ops/sec ±2.52% (123 runs sampled)
Completed perf tests in 58208ms
# for-of
construct large vnode tree x 78,225 ops/sec ±0.14% (137 runs sampled)
rerender identical vnode x 14,888,496 ops/sec ±0.09% (138 runs sampled)
rerender same tree x 324,334 ops/sec ±0.12% (137 runs sampled)
rerender same tree (with selector-generated attrs) x 422,006 ops/sec ±0.08% (136 runs sampled)
add large nested tree x 30,417 ops/sec ±0.24% (136 runs sampled)
mutate styles/properties x 638 ops/sec ±1.68% (127 runs sampled)
repeated add/removal x 12,251 ops/sec ±2.83% (123 runs sampled)
Completed perf tests in 58272ms
# v2.3.7
construct large vnode tree x 78,501 ops/sec ±0.17% (135 runs sampled)
rerender identical vnode x 15,177,095 ops/sec ±0.11% (136 runs sampled)
rerender same tree x 318,919 ops/sec ±0.14% (137 runs sampled)
rerender same tree (with selector-generated attrs) x 410,400 ops/sec ±0.08% (130 runs sampled)
add large nested tree x 29,534 ops/sec ±0.30% (135 runs sampled)
mutate styles/properties x 622 ops/sec ±1.95% (125 runs sampled)
repeated add/removal x 12,160 ops/sec ±2.75% (122 runs sampled)
Completed perf tests in 57669ms
…cy checks after normalization This change preallocates the array to the input length and collapses multiple loops into a single pass. Assigning immediately after preallocation improves performance on V8 (generally neutral elsewhere). Key checks are now performed on normalized vnodes, making the consistency validation more accurate and clarifying the correspondence between error messages and code. Perf-sensitive comments have been clarified to reflect the original intent of commit 6c562d2. Behavior is unchanged, except that the timing/order of related errors may differ slightly. All existing tests pass. Additionally, bundle size is slightly reduced.
281d9aa
to
888f16a
Compare
@dead-claudia |
Vnode.normalizeChildren
now preallocates the array length and performs key-consistency checks after normalization. This improves performance, check accuracy, and bundle size respectively.Description
This PR makes the following changes to Vnode.normalizeChildren:
These changes will not break existing behavior (except for slight differences in the timing or order in which the check errors are thrown.) And the bundle size has been slightly reduced to 8,972 bytes gzipped.
result of
npm run perf
npm run perf
on my laptopThis represents performance roughly middle between #3041 and #3051 (about 3% gain in “rerender same tree”).
Motivation and Context
This is like the re-opening of #3051.
Optimizing the loop alone resulted in only a small performance improvement, so #3051 was closed (#3051 (comment)). After further investigation, I found that combining it with pre-allocating the array length improved performance.
Unlike #3051, this PR makes improvements without reusing arrays completely, avoiding breaking or major changes.
How Has This Been Tested?
npm run test
Types of changes
Checklist