Add `range-diff`, a `tbdiff` lookalike by dscho · Pull Request #1 · gitgitgadget/git

dscho · Jun 5, 2018

The incredibly useful git-tbdiff tool to compare patch series (say, to see what changed between two iterations sent to the Git mailing list) is slightly less useful for this developer due to the fact that it requires the hungarian and numpy Python packages which are for some reason really hard to build in MSYS2. So hard that I even had to give up, because it was simply easier to re-implement the whole shebang as a builtin command.

The project at https://github.com/trast/tbdiff seems to be dormant, anyway. Funny (and true) story: I looked at the open Pull Requests to see how active that project is, only to find to my surprise that I had submitted one in August 2015, and that it was still unanswered let alone merged.

While at it, I forward-ported AEvar's patch to force --decorate=no because git -p tbdiff would fail otherwise.

Side note: I work on implementing range-diff not only to make life easier for reviewers who have to suffer through v2, v3, ... of my patch series, but also to verify my changes before submitting a new iteration. And also, maybe even more importantly, I plan to use it to verify my merging-rebases of Git for
Windows (for which I previously used to redirect the pre-rebase/post-rebase diffs vs upstream and then compare them using git diff --no-index). And of course any interested person can see what changes were necessary e.g. in the merging-rebase of Git for Windows onto v2.17.0 by running a command like:

        base=^{/Start.the.merging-rebase}
        tag=v2.17.0.windows.1
        pre=$tag$base^2
        git range-diff $pre$base..$pre $tag$base..$tag

The command uses what it calls the "dual color mode" (can be disabled via --no-dual-color) which helps identifying what actually changed: it prefixes lines with a - (and red background) that correspond to the first commit range, and with a + (and green background) that correspond to the second range. The rest of the lines will be colored according to the original diffs.

Changes since v4:

Fixed a typo in the commit message of "range-diff: add tests" that was introduced in v4.
White-space fixes.
Fixed the length of the first header underline in the man page.
Changed the preprocessor guard in linear-assignment.h to reflect the new name (instead of the old name, which was hungarian.h).
Likewise, changed the preprocessor guards in range-diff.h to hide the history of the thrice-renamed command.
Fixed indentation in the completion.
Instead of trying to paper over white-space error handling that does not apply to "diffs of diffs", dual color mode now simply disables all white-space warnings.
When showing the "single arg must be symmetric range" error message, git range-diff now also shows the usage.
Adjusted the commit message of "range-diff: adjust the output of the commit pairs" to avoid the surprise of the reviewer when onelines are printed all of a sudden, too.
"range-diff: adjust the output of the commit pairs" is now using a simpler way to print onelines.
We are now sandwiching the diff_opt_parse() loop between two parse_options(), to make sure that we caught all options, and that the -- separator is handled.
Adjusted the lookup_commit_reference() call to the newest master (it now takes a the_repository parameter).

Changes since v3:

The cover letter was adjusted to reflect the new reality (the command is called range-diff now, not branch-diff, and --dual-color is the default).
The documentation was adjusted a bit more in the patch that makes --dual-color the default.
Clarified the calculation of the cost matrix, as per Stefan Beller's request.
The man page now spells out that merge commits are ignored in the commit ranges (not merges per se).
The code in linear-assignment.c was adjusted to use the SWAP() macro.
The commit message of the patch introducing the first rudimentary implementation no longer talks about the "Hungarian" algorithm, but about the "linear assignment algorithm" instead.
A bogus indentation change was backed out from the patch introducing the first rudimentary implementation.
Instead of merely warning about missing .. in the 2-parameter invocation, we now exit with the error message.
The diff_opt_parse() function is allowed to return a value larger than 1, indicating that more than just one command-line parameter was parsed. We now advance by the indicated value instead of always advancing exactly 1 (which is still correct much of the time).
A lengthy if...else if...else if...else was simplified (from a logical point of view) by reordering it.
The unnecessarily static variable dashes was turned into a local variable of the caller.
The commit message talking about the new man page still referred to git branch --diff, which has been fixed.
A forgotten t7910 reference was changed to t3206.
An unbalanced double-tick was fixed in the man page.
Fixed grammar both of the commit message and the description of the --no-dual-color option.
To fix the build, a blank man page is now introduced together with the new range-diff command, even if it is populated for real only at a later patch (i.e. at the same time as before).
The headaches Junio fears would be incurred by that simple workaround to avoid bogus white-space error reporting are fended off: a more complex patch is now in place that adds (and uses) a new white-space flag. Sadly, as is all too common when Junio "encourages" me to replace a simple workaround by something "proper", it caused all kinds of headaches to get this right, so I am rather less certain that the "proper" fix will cause us less headaches than the simple workaround would have done. But whatever.
The dual color mode now also dims the changes that are exclusively in the first specified commit range, and uses bold face on the changes exclusively in the second one. This matches the intuition when using range-diff to compare an older iteration of a patch series to a newer one: the changes from the previous iteration that were replaced by new ones "fade", while the changes that replace them are "shiny new".

Changes since v2:

Right-aligned the patch numbers in the commit pairs.
Used ALLOC_ARRAY() in hungarian.c instead of xmalloc(sizeof()*size).
Changed compute_assignment()s return type from int to void, as it always succeeds.
Changed the Hungarian Algorithm to use an integer cost matrix.
Changed the --creation-weight option to --creation-factor where is an integer.
Retitled 1/19 and 2/19 to better conform with the current conventions, as pointed out (and suggested) by Junio.
Shut up Coverity, and at the same time avoided passing the unnecessary i and j parameters to output_pair_header().
Removed support for the --no-patches option: we inherit diff_options' support for -s already (and much more).
Removed the ugly _INV enum values, and introduced a beautiful GIT_COLOR_REVERSE instead. This way, whatever the user configured as color.diff.new (or .old) will be used in reverse in the dual color mode.
Instead of overriding the fragment header color, the dual color mode will now reverse the "outer" fragment headers, too.
Turned the stand-alone branch-diff command into the --diff option of git branch. Adjusted pretty much all commit messages to account for this. This change should no longer be visible: see below.
Pretty much re-wrote the completion, to support the new --diff mode of git-branch. See below: it was reverted for range-diff.
Renamed t7910 to t3206, to be closer to the git-branch tests.
Ensured that git_diff_ui_config() gets called, and therefore color.diff.* respected.
Avoided leaking four_spaces.
Fixed a declaration in a for (;;) statement (which Junio had as a fixup! that I almost missed).
Renamed branch --diff, which had been renamed from branch-diff (which was picked to avoid re-using tbdiff) to range-diff.
Renamed hungarian.c and its header to linear-assignment.c
Made --dual-color the default, and changed it to still auto-detect whether color should be used rather than forcing it

gitgitgadget · Jun 7, 2018

An error occurred while submitting:

Error: Branch f8c4e30c63bbfa7efec44ff0f6d0404326723d35 is not rebased to upstream/master

Change "fetch" to treat "+" in refspecs (aka --force) to mean we should clobber a local tag of the same name. This changes the long-standing behavior of "fetch" added in 853a369 ("[PATCH] Multi-head fetch.", 2005-08-20), before this change all tag fetches effectively had --force enabled. The original rationale in that change was: > Tags need not be pointing at commits so there is no way to > guarantee "fast-forward" anyway. That comment and the rest of the history of "fetch" shows that the "+" (--force) part of refpecs was only conceived for branch updates, while tags have accepted any changes from upstream unconditionally and clobbered the local tag object. Changing this behavior has been discussed as early as 2011[1]. I the current behavior doesn't make sense, it easily results in local tags accidentally being clobbered. Ideally we'd namespace our tags per-remote, but as with my 97716d2 ("fetch: add a --prune-tags option and fetch.pruneTags config", 2018-02-09) it's easier to work around the current implementation than to fix the root cause, so this implements suggestion #1 from [1], "fetch" now only clobbers the tag if either "+" is provided as part of the refspec, or if "--force" is provided on the command-line. This also makes it nicely symmetrical with how "tag" itself works. We'll now refuse to clobber any existing tags unless "--force" is supplied, whether that clobbering would happen by clobbering a local tag with "tag", or by fetching it from the remote with "fetch". It's still not at all nicely symmetrical with how "git push" works, as discussed in the updated pull-fetch-param.txt documentation, but this change brings them more into line with one another. I don't think there's any reason "fetch" couldn't fully converge with the behavior used by "push", but that's a topic for another change. One of the tests added in 31b808a ("clone --single: limit the fetch refspec to fetched branch", 2012-09-20) is being changed to use --force where a clone would clobber a tag. This changes nothing about the existing behavior of the test. 1. https://public-inbox.org/git/20111123221658.GA22313@sigill.intra.peff.net/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>

dscho · Jul 3, 2018

/submit

gitgitgadget · Jul 3, 2018

An error occurred while submitting:

Error: git tag -F - -a pr-1/dscho/branch-diff-v3 4a68b95 failed: 128,
fatal: tag 'pr-1/dscho/branch-diff-v3' already exists

dscho · Jul 3, 2018

/submit

gitgitgadget · Jul 3, 2018

Submitted as pull.1.v3.git.gitgitgadget@gmail.com

dscho · Jul 21, 2018

/submit

gitgitgadget · Jul 21, 2018

Submitted as pull.1.v4.git.gitgitgadget@gmail.com

find_cfg_ent() allocates a struct reflog_expire_entry_option via FLEX_ALLOC_MEM and inserts it into a linked list in the reflog_expire_options structure. The entries in this list are never freed, resulting in a leak in cmd_reflog_expire and the gc reflog expire maintenance task: Direct leak of 39 byte(s) in 1 object(s) allocated from: #0 0x7ff975ee6883 in calloc (/lib64/libasan.so.8+0xe6883) #1 0x0000010edada in xcalloc ../wrapper.c:154 #2 0x000000df0898 in find_cfg_ent ../reflog.c:28 #3 0x000000df0898 in reflog_expire_config ../reflog.c:70 #4 0x00000095c451 in configset_iter ../config.c:2116 #5 0x0000006d29e7 in git_config ../config.h:724 #6 0x0000006d29e7 in cmd_reflog_expire ../builtin/reflog.c:205 #7 0x0000006d504c in cmd_reflog ../builtin/reflog.c:419 #8 0x0000007e4054 in run_builtin ../git.c:480 #9 0x0000007e4054 in handle_builtin ../git.c:746 #10 0x0000007e8a35 in run_argv ../git.c:813 #11 0x0000007e8a35 in cmd_main ../git.c:953 #12 0x000000441e8f in main ../common-main.c:9 #13 0x7ff9754115f4 in __libc_start_call_main (/lib64/libc.so.6+0x35f4) #14 0x7ff9754116a7 in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x36a7) #15 0x000000444184 in _start (/home/jekeller/libexec/git-core/git+0x444184) Close this leak by adding a reflog_clear_expire_config() function which iterates the linked list and frees its elements. Call it upon exit of cmd_reflog_expire() and reflog_expire_condition(). Add a basic test which covers this leak. While at it, cover the functionality from commit commit 3cb22b8 (Per-ref reflog expiry configuration, 2008-06-15). We've had this support for years, but lacked any tests. Co-developed-by: Jeff King <peff@peff.net> Signed-off-by: Jacob Keller <jacob.keller@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>

The fill_packs_from_midx() method was refactored in fcb2205 (midx: implement support for writing incremental MIDX chains, 2024-08-06) to allow for preferred packfiles and incremental multi-pack-indexes. However, this led to some conditions that can cause improperly initialized memory in the context's list of packfiles. The conditions caring about the preferred pack name or the incremental flag are currently necessary to load a packfile. But the context is still being populated with pack_info structs based on the packfile array for the existing multi-pack-index even if prepare_midx_pack() isn't called. Add a new test that breaks under --stress when compiled with SANITIZE=address. The chosen number of 100 packfiles was selected to get the --stress output to fail about 50% of the time, while 50 packfiles could not get a failure in most --stress runs. This test has a very minor check at the end confirming only one packfile remaining. The failing nature of this test actually relies on auto-GC cleaning up some packfiles during the creation of the commits, as tests setting gc.auto to zero make the packfile count match the number of added commits but also avoids hitting the memory issue. The test case is marked as EXPENSIVE not only because of the number of packfiles it creates, but because some CI environments were reporting errors during the test that I could not reproduce, specifically around being unable to open the packfiles or their pack-indexes. When it fails under SANITIZE=address, it provides the following error: AddressSanitizer:DEADLYSIGNAL ================================================================= ==3263517==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000027 ==3263517==The signal is caused by a READ memory access. ==3263517==Hint: address points to the zero page. #0 0x562d5d82d1fb in close_pack_windows packfile.c:299 #1 0x562d5d82d3ab in close_pack packfile.c:354 #2 0x562d5d7bfdb4 in write_midx_internal midx-write.c:1490 #3 0x562d5d7c7aec in midx_repack midx-write.c:1795 #4 0x562d5d46fff6 in cmd_multi_pack_index builtin/multi-pack-index.c:305 ... This failure stack trace is disconnected from the real fix because it the bad pointers are accessed later when closing the packfiles from the context. There are a few different aspects to this fix that are worth noting: 1. We return to the previous behavior of fill_packs_from_midx to not rely on the incremental flag or existence of a preferred pack. 2. The behavior to scan all layers of an incremental midx is kept, so this is not a full revert of the change. 3. We skip allocating more room in the pack_info array if the pack fails prepare_midx_pack(). 4. The method has always returned 0 for success and 1 for failure, but the condition checking for error added a check for a negative result for failure, so that is now updated. 5. The call to open_pack_index() is removed, but this is needed later in the case of a preferred pack. That call is moved to immediately before its result is needed (checking for the object count). Signed-off-by: Derrick Stolee <stolee@gmail.com>

The fill_packs_from_midx() method was refactored in fcb2205 (midx: implement support for writing incremental MIDX chains, 2024-08-06) to allow for preferred packfiles and incremental multi-pack-indexes. However, this led to some conditions that can cause improperly initialized memory in the context's list of packfiles. The conditions caring about the preferred pack name or the incremental flag are currently necessary to load a packfile. But the context is still being populated with pack_info structs based on the packfile array for the existing multi-pack-index even if prepare_midx_pack() isn't called. Add a new test that breaks under --stress when compiled with SANITIZE=address. The chosen number of 100 packfiles was selected to get the --stress output to fail about 50% of the time, while 50 packfiles could not get a failure in most --stress runs. The test case is marked as EXPENSIVE not only because of the number of packfiles it creates, but because some CI environments were reporting errors during the test that I could not reproduce, specifically around being unable to open the packfiles or their pack-indexes. When it fails under SANITIZE=address, it provides the following error: AddressSanitizer:DEADLYSIGNAL ================================================================= ==3263517==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000027 ==3263517==The signal is caused by a READ memory access. ==3263517==Hint: address points to the zero page. #0 0x562d5d82d1fb in close_pack_windows packfile.c:299 #1 0x562d5d82d3ab in close_pack packfile.c:354 #2 0x562d5d7bfdb4 in write_midx_internal midx-write.c:1490 #3 0x562d5d7c7aec in midx_repack midx-write.c:1795 #4 0x562d5d46fff6 in cmd_multi_pack_index builtin/multi-pack-index.c:305 ... This failure stack trace is disconnected from the real fix because the bad pointers are accessed later when closing the packfiles from the context. There are a few different aspects to this fix that are worth noting: 1. We return to the previous behavior of fill_packs_from_midx to not rely on the incremental flag or existence of a preferred pack. 2. The behavior to scan all layers of an incremental midx is kept, so this is not a full revert of the change. 3. We skip allocating more room in the pack_info array if the pack fails prepare_midx_pack(). 4. The method has always returned 0 for success and 1 for failure, but the condition checking for error added a check for a negative result for failure, so that is now updated. 5. The call to open_pack_index() is removed, but this is needed later in the case of a preferred pack. That call is moved to immediately before its result is needed (checking for the object count). Signed-off-by: Derrick Stolee <stolee@gmail.com>

The fill_packs_from_midx() method was refactored in fcb2205 (midx: implement support for writing incremental MIDX chains, 2024-08-06) to allow for preferred packfiles and incremental multi-pack-indexes. However, this led to some conditions that can cause improperly initialized memory in the context's list of packfiles. The conditions caring about the preferred pack name or the incremental flag are currently necessary to load a packfile. But the context is still being populated with pack_info structs based on the packfile array for the existing multi-pack-index even if prepare_midx_pack() isn't called. Add a new test that breaks under --stress when compiled with SANITIZE=address. The chosen number of 100 packfiles was selected to get the --stress output to fail about 50% of the time, while 50 packfiles could not get a failure in most --stress runs. The test case is marked as EXPENSIVE not only because of the number of packfiles it creates, but because some CI environments were reporting errors during the test that I could not reproduce, specifically around being unable to open the packfiles or their pack-indexes. When it fails under SANITIZE=address, it provides the following error: AddressSanitizer:DEADLYSIGNAL ================================================================= ==3263517==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000027 ==3263517==The signal is caused by a READ memory access. ==3263517==Hint: address points to the zero page. #0 0x562d5d82d1fb in close_pack_windows packfile.c:299 #1 0x562d5d82d3ab in close_pack packfile.c:354 #2 0x562d5d7bfdb4 in write_midx_internal midx-write.c:1490 #3 0x562d5d7c7aec in midx_repack midx-write.c:1795 #4 0x562d5d46fff6 in cmd_multi_pack_index builtin/multi-pack-index.c:305 ... This failure stack trace is disconnected from the real fix because the bad pointers are accessed later when closing the packfiles from the context. There are a few different aspects to this fix that are worth noting: 1. We return to the previous behavior of fill_packs_from_midx to not rely on the incremental flag or existence of a preferred pack. 2. The behavior to scan all layers of an incremental midx is kept, so this is not a full revert of the change. 3. We skip allocating more room in the pack_info array if the pack fails prepare_midx_pack(). 4. The method has always returned 0 for success and 1 for failure, but the condition checking for error added a check for a negative result for failure, so that is now updated. 5. The call to open_pack_index() is removed, but this is needed later in the case of a preferred pack. That call is moved to immediately before its result is needed (checking for the object count). Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>

When performing auto-maintenance we check whether commit graphs need to be generated by counting the number of commits that are reachable by any reference, but not covered by a commit graph. This search is performed by iterating through all references and then doing a depth-first search until we have found enough commits that are not present in the commit graph. This logic has a memory leak though: Direct leak of 16 byte(s) in 1 object(s) allocated from: #0 0x55555562e433 in malloc (git+0xda433) #1 0x555555964322 in do_xmalloc ../wrapper.c:55:8 #2 0x5555559642e6 in xmalloc ../wrapper.c:76:9 #3 0x55555579bf29 in commit_list_append ../commit.c:1872:35 #4 0x55555569f160 in dfs_on_ref ../builtin/gc.c:1165:4 #5 0x5555558c33fd in do_for_each_ref_iterator ../refs/iterator.c:431:12 #6 0x5555558af520 in do_for_each_ref ../refs.c:1828:9 #7 0x5555558ac317 in refs_for_each_ref ../refs.c:1833:9 #8 0x55555569e207 in should_write_commit_graph ../builtin/gc.c:1188:11 #9 0x55555569c915 in maintenance_is_needed ../builtin/gc.c:3492:8 #10 0x55555569b76a in cmd_maintenance ../builtin/gc.c:3542:9 #11 0x55555575166a in run_builtin ../git.c:506:11 #12 0x5555557502f0 in handle_builtin ../git.c:779:9 #13 0x555555751127 in run_argv ../git.c:862:4 #14 0x55555575007b in cmd_main ../git.c:984:19 #15 0x5555557523aa in main ../common-main.c:9:11 #16 0x7ffff7a2a4d7 in __libc_start_call_main (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a4d7) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab) #17 0x7ffff7a2a59a in __libc_start_main@GLIBC_2.2.5 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a59a) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab) #18 0x5555555f0934 in _start (git+0x9c934) The root cause of this memory leak is our use of `commit_list_append()`. This function expects as parameters the item to append and the _tail_ of the list to append. This tail will then be overwritten with the new tail of the list so that it can be used in subsequent calls. But we call it with `commit_list_append(parent->item, &stack)`, so we end up losing everything but the new item. This issue only surfaces when counting merge commits. Next to being a memory leak, it also shows that we're in fact miscounting as we only respect children of the last parent. All previous parents are discarded, so their children will be disregarded unless they are hit via another reference. While crafting a test case for the issue I was puzzled that I couldn't establish the proper border at which the auto-condition would be fulfilled. As it turns out, there's another bug: if an object is at the tip of any reference we don't mark it as seen. Consequently, if it is reachable via any other reference, we'd count that object twice. Fix both of these bugs so that we properly count objects without leaking any memory. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

In the next commit we are about to move the packfile store into the ODB source so that we have one store per source. This will lead to a memory leak in the following commit when reading data from a submodule via git-grep(1): Direct leak of 192 byte(s) in 1 object(s) allocated from: #0 0x55555562e726 in calloc (git+0xda726) #1 0x555555964734 in xcalloc ../wrapper.c:154:8 #2 0x555555835136 in load_multi_pack_index_one ../midx.c:135:2 #3 0x555555834fd6 in load_multi_pack_index ../midx.c:382:6 #4 0x5555558365b6 in prepare_multi_pack_index_one ../midx.c:716:17 #5 0x55555586c605 in packfile_store_prepare ../packfile.c:1103:3 #6 0x55555586c90c in packfile_store_reprepare ../packfile.c:1118:2 #7 0x5555558546b3 in odb_reprepare ../odb.c:1106:2 #8 0x5555558539e4 in do_oid_object_info_extended ../odb.c:715:4 #9 0x5555558533d1 in odb_read_object_info_extended ../odb.c:862:8 #10 0x5555558540bd in odb_read_object ../odb.c:920:6 #11 0x55555580a330 in grep_source_load_oid ../grep.c:1934:12 #12 0x55555580a13a in grep_source_load ../grep.c:1986:10 #13 0x555555809103 in grep_source_is_binary ../grep.c:2014:7 #14 0x555555807574 in grep_source_1 ../grep.c:1625:8 #15 0x555555807322 in grep_source ../grep.c:1837:10 #16 0x5555556a5c58 in run ../builtin/grep.c:208:10 #17 0x55555562bb42 in void* ThreadStartFunc<false>(void*) lsan_interceptors.cpp.o #18 0x7ffff7a9a979 in start_thread (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x9a979) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab) #19 0x7ffff7b22d2b in __GI___clone3 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x122d2b) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab) The root caues of this leak is the way we set up and release the submodule: 1. We use `repo_submodule_init()` to initialize a new repository. This repository is stored in `repos_to_free`. 2. We now read data from the submodule repository. 3. We then call `repo_clear()` on the submodule repositories. 4. `repo_clear()` calls `odb_free()`. 5. `odb_free()` calls `odb_free_sources()` followed by `odb_close()`. The issue here is the 5th step: we call `odb_free_sources()` _before_ we call `odb_close()`. But `odb_free_sources()` already frees all sources, so the logic that closes them in `odb_close()` now becomes a no-op. As a consequence, we never explicitly close sources at all. Fix the leak by closing the store before we free the sources. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

The MIDX file format currently requires that pack files be identified by the lexicographic ordering of their names (that is, a pack having a checksum beginning with "abc" would have a numeric pack_int_id which is smaller than the same value for a pack beginning with "bcd"). As a result, it is impossible to combine adjacent MIDX layers together without permuting bits from bitmaps that are in more recent layer(s). To see why, consider the following example: | packs | preferred pack --------+-------------+--------------- MIDX #0 | { X, Y, Z } | Y MIDX #1 | { A, B, C } | B MIDX #2 | { D, E, F } | D , where MIDX #2's base MIDX is MIDX #1, and so on. Suppose that we want to combine MIDX layers #0 and #1, to create a new layer #0' containing the packs from both layers. With the original three MIDX layers, objects are laid out in the bitmap in the order they appear in their source pack, and the packs themselves are arranged according to the pseudo-pack order. In this case, that ordering is Y, X, Z, B, A, C. But recall that the pseudo-pack ordering is defined by the order that packs appear in the MIDX, with the exception of the preferred pack, which sorts ahead of all other packs regardless of its position within the MIDX. In the above example, that means that pack 'Y' could be placed anywhere (so long as it is designated as preferred), however, all other packs must be placed in the location listed above. Because that ordering isn't sorted lexicographically, it is impossible to compact MIDX layers in the above configuration without permuting the object-to-bit-position mapping. Changing this mapping would affect all bitmaps belonging to newer layers, rendering the bitmaps associated with MIDX #2 unreadable. One of the goals of MIDX compaction is that we are able to shrink the length of the MIDX chain *without* invalidating bitmaps that belong to newer layers, and the lexicographic ordering constraint is at odds with this goal. However, packs do not *need* to be lexicographically ordered within the MIDX. As far as I can gather, the only reason they are sorted lexically is to make it possible to perform a binary search over the pack names in a MIDX, necessary to make `midx_contains_pack()`'s performance logarithmic in the number of packs rather than linear. Relax this constraint by allowing MIDX writes to proceed with packs that are not arranged in lexicographic order. `midx_contains_pack()` will lazily instantiate a `pack_names_sorted` array on the MIDX, which will be used to implement the binary search over pack names. Note that this produces MIDXs which may be incompatible with earlier versions of Git that have stricter requirements on the layout of packs within a MIDX. This patch does *not* modify the version number of the MIDX format, since existing versions of Git already know to gracefully ignore a MIDX with packs that appear out-of-order. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>

When pushing to a set of remotes using a nickname for the group, the client initializes the connection to each remote, talks to the remote and reads and parses capabilities line, and holds the capabilities in a file-scope static variable server_capabilities_v1. There are a few other such file-scope static variables, and these connections cannot be parallelized until they are refactored to a structure that keeps track of active connections. Which is *not* the theme of this patch ;-) For a single connection, the server_capabilities_v1 variable is initialized to NULL (at the program initialization), populated when we talk to the other side, used to look up capabilities of the other sdie possible multiple times, and the memory is held by the variable until program exit, without leaking. When talking to multiple remotes, however, the server capabilities from the second connection overwrites without freeing the one from the first connection, which leaks. ==1080970==ERROR: LeakSanitizer: detected memory leaks Direct leak of 421 byte(s) in 2 object(s) allocated from: #0 0x5615305f849e in strdup (/home/gitster/g/git-jch/bin/bin/git+0x2b349e) (BuildId: 54d149994c9e85374831958f694bd0aa3b8b1e26) #1 0x561530e76cc4 in xstrdup /home/gitster/w/build/wrapper.c:43:14 #2 0x5615309cd7fa in process_capabilities /home/gitster/w/build/connect.c:243:27 #3 0x5615309cd502 in get_remote_heads /home/gitster/w/build/connect.c:366:4 #4 0x561530e2cb0b in handshake /home/gitster/w/build/transport.c:372:3 #5 0x561530e29ed7 in get_refs_via_connect /home/gitster/w/build/transport.c:398:9 #6 0x561530e26464 in transport_push /home/gitster/w/build/transport.c:1421:16 #7 0x561530800bec in push_with_options /home/gitster/w/build/builtin/push.c:387:8 #8 0x5615307ffb99 in do_push /home/gitster/w/build/builtin/push.c:442:7 #9 0x5615307fe926 in cmd_push /home/gitster/w/build/builtin/push.c:664:7 #10 0x56153065673f in run_builtin /home/gitster/w/build/git.c:506:11 #11 0x56153065342f in handle_builtin /home/gitster/w/build/git.c:779:9 #12 0x561530655b89 in run_argv /home/gitster/w/build/git.c:862:4 #13 0x561530652cba in cmd_main /home/gitster/w/build/git.c:984:19 #14 0x5615308dda0a in main /home/gitster/w/build/common-main.c:9:11 #15 0x7f051651bca7 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16 SUMMARY: AddressSanitizer: 421 byte(s) leaked in 2 allocation(s). Free the capablities data for the previous server before overwriting it with the next server to plug this leak. The added test fails without the freeing with SANITIZE=leak; I somehow couldn't get it fail reliably with SANITIZE=leak,address though. Signed-off-by: Junio C Hamano <gitster@pobox.com>

When pushing to a set of remotes using a nickname for the group, the client initializes the connection to each remote, talks to the remote and reads and parses capabilities line, and holds the capabilities in a file-scope static variable server_capabilities_v1. There are a few other such file-scope static variables, and these connections cannot be parallelized until they are refactored to a structure that keeps track of active connections. Which is *not* the theme of this patch ;-) For a single connection, the server_capabilities_v1 variable is initialized to NULL (at the program initialization), populated when we talk to the other side, used to look up capabilities of the other side possibly multiple times, and the memory is held by the variable until program exit, without leaking. When talking to multiple remotes, however, the server capabilities from the second connection overwrites without freeing the one from the first connection, which leaks. ==1080970==ERROR: LeakSanitizer: detected memory leaks Direct leak of 421 byte(s) in 2 object(s) allocated from: #0 0x5615305f849e in strdup (/home/gitster/g/git-jch/bin/bin/git+0x2b349e) (BuildId: 54d149994c9e85374831958f694bd0aa3b8b1e26) #1 0x561530e76cc4 in xstrdup /home/gitster/w/build/wrapper.c:43:14 #2 0x5615309cd7fa in process_capabilities /home/gitster/w/build/connect.c:243:27 #3 0x5615309cd502 in get_remote_heads /home/gitster/w/build/connect.c:366:4 #4 0x561530e2cb0b in handshake /home/gitster/w/build/transport.c:372:3 #5 0x561530e29ed7 in get_refs_via_connect /home/gitster/w/build/transport.c:398:9 #6 0x561530e26464 in transport_push /home/gitster/w/build/transport.c:1421:16 #7 0x561530800bec in push_with_options /home/gitster/w/build/builtin/push.c:387:8 #8 0x5615307ffb99 in do_push /home/gitster/w/build/builtin/push.c:442:7 #9 0x5615307fe926 in cmd_push /home/gitster/w/build/builtin/push.c:664:7 #10 0x56153065673f in run_builtin /home/gitster/w/build/git.c:506:11 #11 0x56153065342f in handle_builtin /home/gitster/w/build/git.c:779:9 #12 0x561530655b89 in run_argv /home/gitster/w/build/git.c:862:4 #13 0x561530652cba in cmd_main /home/gitster/w/build/git.c:984:19 #14 0x5615308dda0a in main /home/gitster/w/build/common-main.c:9:11 #15 0x7f051651bca7 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16 SUMMARY: AddressSanitizer: 421 byte(s) leaked in 2 allocation(s). Free the capablities data for the previous server before overwriting it with the next server to plug this leak. The added test fails without the freeing with SANITIZE=leak; I somehow couldn't get it fail reliably with SANITIZE=leak,address though. Signed-off-by: Junio C Hamano <gitster@pobox.com>

When performing auto-maintenance we check whether commit graphs need to be generated by counting the number of commits that are reachable by any reference, but not covered by a commit graph. This search is performed by iterating through all references and then doing a depth-first search until we have found enough commits that are not present in the commit graph. This logic has a memory leak though: Direct leak of 16 byte(s) in 1 object(s) allocated from: #0 0x55555562e433 in malloc (git+0xda433) #1 0x555555964322 in do_xmalloc ../wrapper.c:55:8 #2 0x5555559642e6 in xmalloc ../wrapper.c:76:9 #3 0x55555579bf29 in commit_list_append ../commit.c:1872:35 #4 0x55555569f160 in dfs_on_ref ../builtin/gc.c:1165:4 #5 0x5555558c33fd in do_for_each_ref_iterator ../refs/iterator.c:431:12 #6 0x5555558af520 in do_for_each_ref ../refs.c:1828:9 #7 0x5555558ac317 in refs_for_each_ref ../refs.c:1833:9 #8 0x55555569e207 in should_write_commit_graph ../builtin/gc.c:1188:11 #9 0x55555569c915 in maintenance_is_needed ../builtin/gc.c:3492:8 #10 0x55555569b76a in cmd_maintenance ../builtin/gc.c:3542:9 #11 0x55555575166a in run_builtin ../git.c:506:11 #12 0x5555557502f0 in handle_builtin ../git.c:779:9 #13 0x555555751127 in run_argv ../git.c:862:4 #14 0x55555575007b in cmd_main ../git.c:984:19 #15 0x5555557523aa in main ../common-main.c:9:11 #16 0x7ffff7a2a4d7 in __libc_start_call_main (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a4d7) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab) #17 0x7ffff7a2a59a in __libc_start_main@GLIBC_2.2.5 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a59a) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab) #18 0x5555555f0934 in _start (git+0x9c934) The root cause of this memory leak is our use of `commit_list_append()`. This function expects as parameters the item to append and the _tail_ of the list to append. This tail will then be overwritten with the new tail of the list so that it can be used in subsequent calls. But we call it with `commit_list_append(parent->item, &stack)`, so we end up losing everything but the new item. This issue only surfaces when counting merge commits. Next to being a memory leak, it also shows that we're in fact miscounting as we only respect children of the last parent. All previous parents are discarded, so their children will be disregarded unless they are hit via another reference. While crafting a test case for the issue I was puzzled that I couldn't establish the proper border at which the auto-condition would be fulfilled. As it turns out, there's another bug: if an object is at the tip of any reference we don't mark it as seen. Consequently, if it is reachable via any other reference, we'd count that object twice. Fix both of these bugs so that we properly count objects without leaking any memory. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

It is possible to hit a memory leak when reading data from a submodule via git-grep(1): Direct leak of 192 byte(s) in 1 object(s) allocated from: #0 0x55555562e726 in calloc (git+0xda726) #1 0x555555964734 in xcalloc ../wrapper.c:154:8 #2 0x555555835136 in load_multi_pack_index_one ../midx.c:135:2 #3 0x555555834fd6 in load_multi_pack_index ../midx.c:382:6 #4 0x5555558365b6 in prepare_multi_pack_index_one ../midx.c:716:17 #5 0x55555586c605 in packfile_store_prepare ../packfile.c:1103:3 #6 0x55555586c90c in packfile_store_reprepare ../packfile.c:1118:2 #7 0x5555558546b3 in odb_reprepare ../odb.c:1106:2 #8 0x5555558539e4 in do_oid_object_info_extended ../odb.c:715:4 #9 0x5555558533d1 in odb_read_object_info_extended ../odb.c:862:8 #10 0x5555558540bd in odb_read_object ../odb.c:920:6 #11 0x55555580a330 in grep_source_load_oid ../grep.c:1934:12 #12 0x55555580a13a in grep_source_load ../grep.c:1986:10 #13 0x555555809103 in grep_source_is_binary ../grep.c:2014:7 #14 0x555555807574 in grep_source_1 ../grep.c:1625:8 #15 0x555555807322 in grep_source ../grep.c:1837:10 #16 0x5555556a5c58 in run ../builtin/grep.c:208:10 #17 0x55555562bb42 in void* ThreadStartFunc<false>(void*) lsan_interceptors.cpp.o #18 0x7ffff7a9a979 in start_thread (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x9a979) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab) #19 0x7ffff7b22d2b in __GI___clone3 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x122d2b) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab) The root caues of this leak is the way we set up and release the submodule: 1. We use `repo_submodule_init()` to initialize a new repository. This repository is stored in `repos_to_free`. 2. We now read data from the submodule repository. 3. We then call `repo_clear()` on the submodule repositories. 4. `repo_clear()` calls `odb_free()`. 5. `odb_free()` calls `odb_free_sources()` followed by `odb_close()`. The issue here is the 5th step: we call `odb_free_sources()` _before_ we call `odb_close()`. But `odb_free_sources()` already frees all sources, so the logic that closes them in `odb_close()` now becomes a no-op. As a consequence, we never explicitly close sources at all. Fix the leak by closing the store before we free the sources. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

When performing auto-maintenance we check whether commit graphs need to be generated by counting the number of commits that are reachable by any reference, but not covered by a commit graph. This search is performed by iterating through all references and then doing a depth-first search until we have found enough commits that are not present in the commit graph. This logic has a memory leak though: Direct leak of 16 byte(s) in 1 object(s) allocated from: #0 0x55555562e433 in malloc (git+0xda433) #1 0x555555964322 in do_xmalloc ../wrapper.c:55:8 #2 0x5555559642e6 in xmalloc ../wrapper.c:76:9 #3 0x55555579bf29 in commit_list_append ../commit.c:1872:35 #4 0x55555569f160 in dfs_on_ref ../builtin/gc.c:1165:4 #5 0x5555558c33fd in do_for_each_ref_iterator ../refs/iterator.c:431:12 #6 0x5555558af520 in do_for_each_ref ../refs.c:1828:9 #7 0x5555558ac317 in refs_for_each_ref ../refs.c:1833:9 #8 0x55555569e207 in should_write_commit_graph ../builtin/gc.c:1188:11 #9 0x55555569c915 in maintenance_is_needed ../builtin/gc.c:3492:8 #10 0x55555569b76a in cmd_maintenance ../builtin/gc.c:3542:9 #11 0x55555575166a in run_builtin ../git.c:506:11 #12 0x5555557502f0 in handle_builtin ../git.c:779:9 #13 0x555555751127 in run_argv ../git.c:862:4 #14 0x55555575007b in cmd_main ../git.c:984:19 #15 0x5555557523aa in main ../common-main.c:9:11 #16 0x7ffff7a2a4d7 in __libc_start_call_main (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a4d7) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab) #17 0x7ffff7a2a59a in __libc_start_main@GLIBC_2.2.5 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a59a) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab) #18 0x5555555f0934 in _start (git+0x9c934) The root cause of this memory leak is our use of `commit_list_append()`. This function expects as parameters the item to append and the _tail_ of the list to append. This tail will then be overwritten with the new tail of the list so that it can be used in subsequent calls. But we call it with `commit_list_append(parent->item, &stack)`, so we end up losing everything but the new item. This issue only surfaces when counting merge commits. Next to being a memory leak, it also shows that we're in fact miscounting as we only respect children of the last parent. All previous parents are discarded, so their children will be disregarded unless they are hit via another reference. While crafting a test case for the issue I was puzzled that I couldn't establish the proper border at which the auto-condition would be fulfilled. As it turns out, there's another bug: if an object is at the tip of any reference we don't mark it as seen. Consequently, if it is the tip of or reachable via another ref, we'd count that object multiple times. Fix both of these bugs so that we properly count objects without leaking any memory. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

It is possible to hit a memory leak when reading data from a submodule via git-grep(1): Direct leak of 192 byte(s) in 1 object(s) allocated from: #0 0x55555562e726 in calloc (git+0xda726) #1 0x555555964734 in xcalloc ../wrapper.c:154:8 #2 0x555555835136 in load_multi_pack_index_one ../midx.c:135:2 #3 0x555555834fd6 in load_multi_pack_index ../midx.c:382:6 #4 0x5555558365b6 in prepare_multi_pack_index_one ../midx.c:716:17 #5 0x55555586c605 in packfile_store_prepare ../packfile.c:1103:3 #6 0x55555586c90c in packfile_store_reprepare ../packfile.c:1118:2 #7 0x5555558546b3 in odb_reprepare ../odb.c:1106:2 #8 0x5555558539e4 in do_oid_object_info_extended ../odb.c:715:4 #9 0x5555558533d1 in odb_read_object_info_extended ../odb.c:862:8 #10 0x5555558540bd in odb_read_object ../odb.c:920:6 #11 0x55555580a330 in grep_source_load_oid ../grep.c:1934:12 #12 0x55555580a13a in grep_source_load ../grep.c:1986:10 #13 0x555555809103 in grep_source_is_binary ../grep.c:2014:7 #14 0x555555807574 in grep_source_1 ../grep.c:1625:8 #15 0x555555807322 in grep_source ../grep.c:1837:10 #16 0x5555556a5c58 in run ../builtin/grep.c:208:10 #17 0x55555562bb42 in void* ThreadStartFunc<false>(void*) lsan_interceptors.cpp.o #18 0x7ffff7a9a979 in start_thread (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x9a979) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab) #19 0x7ffff7b22d2b in __GI___clone3 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x122d2b) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab) The root caues of this leak is the way we set up and release the submodule: 1. We use `repo_submodule_init()` to initialize a new repository. This repository is stored in `repos_to_free`. 2. We now read data from the submodule repository. 3. We then call `repo_clear()` on the submodule repositories. 4. `repo_clear()` calls `odb_free()`. 5. `odb_free()` calls `odb_free_sources()` followed by `odb_close()`. The issue here is the 5th step: we call `odb_free_sources()` _before_ we call `odb_close()`. But `odb_free_sources()` already frees all sources, so the logic that closes them in `odb_close()` now becomes a no-op. As a consequence, we never explicitly close sources at all. Fix the leak by closing the store before we free the sources. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>

The MIDX file format currently requires that pack files be identified by the lexicographic ordering of their names (that is, a pack having a checksum beginning with "abc" would have a numeric pack_int_id which is smaller than the same value for a pack beginning with "bcd"). As a result, it is impossible to combine adjacent MIDX layers together without permuting bits from bitmaps that are in more recent layer(s). To see why, consider the following example: | packs | preferred pack --------+-------------+--------------- MIDX #0 | { X, Y, Z } | Y MIDX #1 | { A, B, C } | B MIDX #2 | { D, E, F } | D , where MIDX #2's base MIDX is MIDX #1, and so on. Suppose that we want to combine MIDX layers #0 and #1, to create a new layer #0' containing the packs from both layers. With the original three MIDX layers, objects are laid out in the bitmap in the order they appear in their source pack, and the packs themselves are arranged according to the pseudo-pack order. In this case, that ordering is Y, X, Z, B, A, C. But recall that the pseudo-pack ordering is defined by the order that packs appear in the MIDX, with the exception of the preferred pack, which sorts ahead of all other packs regardless of its position within the MIDX. In the above example, that means that pack 'Y' could be placed anywhere (so long as it is designated as preferred), however, all other packs must be placed in the location listed above. Because that ordering isn't sorted lexicographically, it is impossible to compact MIDX layers in the above configuration without permuting the object-to-bit-position mapping. Changing this mapping would affect all bitmaps belonging to newer layers, rendering the bitmaps associated with MIDX #2 unreadable. One of the goals of MIDX compaction is that we are able to shrink the length of the MIDX chain *without* invalidating bitmaps that belong to newer layers, and the lexicographic ordering constraint is at odds with this goal. However, packs do not *need* to be lexicographically ordered within the MIDX. As far as I can gather, the only reason they are sorted lexically is to make it possible to perform a binary search over the pack names in a MIDX, necessary to make `midx_contains_pack()`'s performance logarithmic in the number of packs rather than linear. Relax this constraint by allowing MIDX writes to proceed with packs that are not arranged in lexicographic order. `midx_contains_pack()` will lazily instantiate a `pack_names_sorted` array on the MIDX, which will be used to implement the binary search over pack names. Because this change produces MIDXs which may not be correctly read with external tools or older versions of Git. Though older versions of Git know how to gracefully degrade and ignore any MIDX(s) they consider corrupt, external tools may not be as robust. To avoid unintentionally breaking any such tools, guard this change behind a version bump in the MIDX's on-disk format. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>

The is_work_tree_watched() function in fsmonitor-watchman.sample has two bugs: 1. Wrong variable in error check: After calling watchman_clock(), the result is stored in $o, but the code checks $output->{error} instead of $o->{error}. This means errors from the clock command are silently ignored. 2. Double output violates protocol: When the retry path triggers (the directory wasn't initially watched), output_result() is called with the "/" flag, then launch_watchman() is called recursively which calls output_result() again. This outputs two clock tokens to stdout, but git's fsmonitor v2 protocol expects exactly one response. Fix #1 by checking $o->{error} after watchman_clock(). Fix #2 by removing the recursive launch_watchman() call. The "/" "everything is dirty" flag already tells git to do a full scan, and git will call the hook again on the next invocation with a valid clock token. Apply the same fixes to the test helper scripts in t/t7519/. Signed-off-by: Paul Tarjan <github@paulisageek.com>

For Git to recognize a directory as a Git directory, it requires the directory to contain: 1. 'HEAD' file 2. object/ directory 3. 'refs/' directory Here, #1 and #3 are part of the reference storage mechanism, specifically the files backend. Since then, newer backends such as the reftable backend have moved to using their own path ('reftable/') for storing references. But to ensure git still recognizes the directory as a Git directory, we create stubs. There are two locations we create stubs: - In 'refs/reftable-backend.c' when creating the reftable backend. - In 'clone.c' before spawning transport helpers. In a following commit, we'll add another instance. So instead of repeating the code, let's extract out this code to `refs_create_refdir_stubs()` and use it. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>

For Git to recognize a directory as a Git directory, it requires the directory to contain: 1. 'HEAD' file 2. 'objects/' directory 3. 'refs/' directory Here, #1 and #3 are part of the reference storage mechanism, specifically the files backend. Since then, newer backends such as the reftable backend have moved to using their own path ('reftable/') for storing references. But to ensure Git still recognizes the directory as a Git directory, we create stubs. There are two locations where we create stubs: - In 'refs/reftable-backend.c' when creating the reftable backend. - In 'clone.c' before spawning transport helpers. In a following commit, we'll add another instance. So instead of repeating the code, let's extract out this code to `refs_create_refdir_stubs()` and use it. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>

The MIDX file format currently requires that pack files be identified by the lexicographic ordering of their names (that is, a pack having a checksum beginning with "abc" would have a numeric pack_int_id which is smaller than the same value for a pack beginning with "bcd"). As a result, it is impossible to combine adjacent MIDX layers together without permuting bits from bitmaps that are in more recent layer(s). To see why, consider the following example: | packs | preferred pack --------+-------------+--------------- MIDX #0 | { X, Y, Z } | Y MIDX #1 | { A, B, C } | B MIDX #2 | { D, E, F } | D , where MIDX #2's base MIDX is MIDX #1, and so on. Suppose that we want to combine MIDX layers #0 and #1, to create a new layer #0' containing the packs from both layers. With the original three MIDX layers, objects are laid out in the bitmap in the order they appear in their source pack, and the packs themselves are arranged according to the pseudo-pack order. In this case, that ordering is Y, X, Z, B, A, C. But recall that the pseudo-pack ordering is defined by the order that packs appear in the MIDX, with the exception of the preferred pack, which sorts ahead of all other packs regardless of its position within the MIDX. In the above example, that means that pack 'Y' could be placed anywhere (so long as it is designated as preferred), however, all other packs must be placed in the location listed above. Because that ordering isn't sorted lexicographically, it is impossible to compact MIDX layers in the above configuration without permuting the object-to-bit-position mapping. Changing this mapping would affect all bitmaps belonging to newer layers, rendering the bitmaps associated with MIDX #2 unreadable. One of the goals of MIDX compaction is that we are able to shrink the length of the MIDX chain *without* invalidating bitmaps that belong to newer layers, and the lexicographic ordering constraint is at odds with this goal. However, packs do not *need* to be lexicographically ordered within the MIDX. As far as I can gather, the only reason they are sorted lexically is to make it possible to perform a binary search over the pack names in a MIDX, necessary to make `midx_contains_pack()`'s performance logarithmic in the number of packs rather than linear. Relax this constraint by allowing MIDX writes to proceed with packs that are not arranged in lexicographic order. `midx_contains_pack()` will lazily instantiate a `pack_names_sorted` array on the MIDX, which will be used to implement the binary search over pack names. This change produces MIDXs which may not be correctly read with external tools or older versions of Git. Though older versions of Git know how to gracefully degrade and ignore any MIDX(s) they consider corrupt, external tools may not be as robust. To avoid unintentionally breaking any such tools, guard this change behind a version bump in the MIDX's on-disk format. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>

For Git to recognize a directory as a Git directory, it requires the directory to contain: 1. 'HEAD' file 2. 'objects/' directory 3. 'refs/' directory Here, #1 and #3 are part of the reference storage mechanism, specifically the files backend. Since then, newer backends such as the reftable backend have moved to using their own path ('reftable/') for storing references. But to ensure Git still recognizes the directory as a Git directory, we create stubs. There are two locations where we create stubs: - In 'refs/reftable-backend.c' when creating the reftable backend. - In 'clone.c' before spawning transport helpers. In a following commit, we'll add another instance. So instead of repeating the code, let's extract out this code to `refs_create_refdir_stubs()` and use it. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>

gitgitgadget deleted a comment from gitgitgadget bot Jun 5, 2018

gitgitgadget deleted a comment from gitgitgadget bot Jun 7, 2018

dscho force-pushed the branch-diff branch from f8c4e30 to 1304201 Compare June 7, 2018 20:58

gitgitgadget deleted a comment from gitgitgadget bot Jun 8, 2018

dscho force-pushed the branch-diff branch 2 times, most recently from 79ce337 to 4a68b95 Compare June 30, 2018 21:36

dscho force-pushed the branch-diff branch 3 times, most recently from ebf3fea to d4e27c6 Compare July 9, 2018 20:39

dscho force-pushed the branch-diff branch 4 times, most recently from 5a2cf0c to d8498fb Compare July 21, 2018 21:40

stefanbeller mentioned this pull request Jul 23, 2018

Cover letters are hard to read on the mailing list (break lines?) gitgitgadget/gitgitgadget#26

Closed

gitgitgadget bot pushed a commit that referenced this pull request Aug 4, 2025

What's cooking (2025/08 #1)

edf3134

gitgitgadget bot pushed a commit that referenced this pull request Sep 3, 2025

What's cooking (2025/09 #1)

0bfa849

gitgitgadget bot pushed a commit that referenced this pull request Oct 2, 2025

What's cooking (2025/10 #1)

9641a3b

gitgitgadget bot pushed a commit that referenced this pull request Nov 3, 2025

What's cooking (2025/11 #1)

7dad50d

gitgitgadget bot pushed a commit that referenced this pull request Dec 7, 2025

What's cooking (2025/12 #1)

765304b

gitgitgadget bot pushed a commit that referenced this pull request Jan 1, 2026

What's cooking (2026/01 #1)

60cad93

gitgitgadget bot pushed a commit that referenced this pull request Feb 2, 2026

What's cooking (2026/02 #1)

376d73e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `range-diff`, a `tbdiff` lookalike#1

Add `range-diff`, a `tbdiff` lookalike#1
dscho wants to merge 21 commits intogitgitgadget:mastergitgitgadget/git:masterfrom
dscho:branch-diffdscho/git:branch-diffCopy head branch name to clipboard

dscho commented Jun 5, 2018 •

edited

Loading

Uh oh!

gitgitgadget bot commented Jun 7, 2018

Uh oh!

dscho commented Jul 3, 2018

Uh oh!

gitgitgadget bot commented Jul 3, 2018

Uh oh!

dscho commented Jul 3, 2018

Uh oh!

gitgitgadget bot commented Jul 3, 2018

Uh oh!

dscho commented Jul 21, 2018

Uh oh!

gitgitgadget bot commented Jul 21, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Search code, repositories, users, issues, pull requests...

Conversation

dscho commented Jun 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gitgitgadget bot commented Jun 7, 2018

Uh oh!

dscho commented Jul 3, 2018

Uh oh!

gitgitgadget bot commented Jul 3, 2018

Uh oh!

dscho commented Jul 3, 2018

Uh oh!

gitgitgadget bot commented Jul 3, 2018

Uh oh!

dscho commented Jul 21, 2018

Uh oh!

gitgitgadget bot commented Jul 21, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dscho commented Jun 5, 2018 •

edited

Loading