Use a shared buffer in calls of git_treebuilder_write to avoid heap contention#3892
Use a shared buffer in calls of git_treebuilder_write to avoid heap contention#3892ethomson merged 1 commit intolibgit2:masterlibgit2/libgit2:masterfrom
Conversation
|
Cool, that sounds like a great improvement! First, please revert any changes that are only whitespace - it seems like a bunch of those snuck in. This will help avoid unnecessary churn. Second, I think that a more straightforward change would be to give |
src/tree.c
Outdated
|
|
||
| /* Grow the buffer beforehand to an estimated size */ | ||
| error = git_buf_grow(&buffer, entrycount * 72); | ||
| if (error >= 0) |
There was a problem hiding this comment.
We usually avoid braces around single-statement branches
|
Small nit on the commit message itself: please try to keep the initial line short, as otherwise many git commands become hard to read (e.g. And I agree with @ethomson on that it would be nicer to have a buffer at the treebuilder itself which would be reused for other functions, as well. Anyway, this heads in a good direction. Thanks for contributing :) |
|
Placing the buffer on the builder is a bit tricky, I think. write_tree creates a new treebuilder each call and only calls git_treebuilder_write once with it, so I don't think just that change would reduce the allocations. The builder can't be shared across write_tree calls because of the recursion in the loop. I could put the buffer on the treebuilder, but I'd have to manage the buffer the same way I do now and just set bld->buf in write_tree. That doesn't seem better though. |
|
Aha - thanks for clarifying, @mitesch - obviously you're quite right and I didn't understand the lifecycle there when I was re-reading it. In that case, yes, I think this is great. @pks-t 's correct about our style - we follow the kernel / git styles. If you wouldn't mind squashing a fixup commit that matches the style, that would be great! |
src/tree.c
Outdated
| if ((ret = git_buf_grow(&shared_buf, 1024)) >= 0) | ||
| { | ||
| ret = write_tree(oid, repo, index, "", 0, &shared_buf); | ||
| git_buf_free(&shared_buf); |
There was a problem hiding this comment.
The free doesn't need to be conditional, it'll do the right thing in case of OOM.
src/tree.c
Outdated
| entrycount = git_strmap_num_entries(bld->map); | ||
|
|
||
| /* Grow the buffer beforehand to an estimated size */ | ||
| error = git_buf_grow(&buffer, entrycount * 72); |
There was a problem hiding this comment.
This is the opposite of how we do error handling in the library. We use early returns or jumps to cleanup if things go wrong instead of putting the success path inside ifs.
|
Mostly some style/nomenclature comments, but overall it's looking good. It might be worth making this part of the public API so external users of this function can make use of it as well. The tree update might also benefit from doing something like this, which is another semi-heavy user of this. |
|
Pushed a fixup with most of the requested changes. I'll squash once everything is 100%
|
10c752f to
d368e87
Compare
src/tree.h
Outdated
| void git_tree__free(void *tree); | ||
| int git_tree__parse(void *tree, git_odb_object *obj); | ||
|
|
||
| int git_treebuilder_write_with_buffer(git_oid *oid, git_treebuilder *bld, git_buf *tree); |
There was a problem hiding this comment.
Sorry - one more minor nit. For functions that are internal to the library, we use a double-underscore. For example, this would be git_treebuilder__write_with_buffer.
But there's no need to put this in a header at all; what I would prefer to see is to revert the changes to tree.h and instead declare write_with_buffer as static in tree.c. Would you be so kind? Thanks again!
There was a problem hiding this comment.
The other direction to go to would be to make it part of the public API. This declaration would then be moved to include/git2/tree.h and would need some documentation. I'd be in favour of making it public as it can be useful for external users of the tree builder as well.
There was a problem hiding this comment.
👍 to just making this public. That would be great.
The function to write trees allocates a new buffer for each tree. This causes problems with performance when performing a lot of actions involving writing trees, e.g. when doing many merges. Fix the issue by instead handing in a shared buffer, which is then re-used across the calls without having to re-allocate between calls.
d368e87 to
87aaefe
Compare
When doing many merges in parallel in a large repository, we saw performance degrade rapidly due to memory allocations in git_treebuilder_write. This change uses a shared buffer created in git_tree__write_index to avoid the small allocation per tree in git_treebuilder_write.
In our tests against that repository we saw improvements of ~50% in merges under heavy load.