Add range-diff, a tbdiff lookalike#1
Closed
dscho wants to merge 21 commits intogitgitgadget:mastergitgitgadget/git:masterfrom
Closed
Add range-diff, a tbdiff lookalike#1dscho wants to merge 21 commits intogitgitgadget:mastergitgitgadget/git:masterfrom
range-diff, a tbdiff lookalike#1dscho wants to merge 21 commits intogitgitgadget:mastergitgitgadget/git:masterfrom
Conversation
|
An error occurred while submitting: Error: Branch f8c4e30c63bbfa7efec44ff0f6d0404326723d35 is not rebased to upstream/master |
dscho
pushed a commit
that referenced
this pull request
Jun 20, 2018
Change "fetch" to treat "+" in refspecs (aka --force) to mean we should clobber a local tag of the same name. This changes the long-standing behavior of "fetch" added in 853a369 ("[PATCH] Multi-head fetch.", 2005-08-20), before this change all tag fetches effectively had --force enabled. The original rationale in that change was: > Tags need not be pointing at commits so there is no way to > guarantee "fast-forward" anyway. That comment and the rest of the history of "fetch" shows that the "+" (--force) part of refpecs was only conceived for branch updates, while tags have accepted any changes from upstream unconditionally and clobbered the local tag object. Changing this behavior has been discussed as early as 2011[1]. I the current behavior doesn't make sense, it easily results in local tags accidentally being clobbered. Ideally we'd namespace our tags per-remote, but as with my 97716d2 ("fetch: add a --prune-tags option and fetch.pruneTags config", 2018-02-09) it's easier to work around the current implementation than to fix the root cause, so this implements suggestion #1 from [1], "fetch" now only clobbers the tag if either "+" is provided as part of the refspec, or if "--force" is provided on the command-line. This also makes it nicely symmetrical with how "tag" itself works. We'll now refuse to clobber any existing tags unless "--force" is supplied, whether that clobbering would happen by clobbering a local tag with "tag", or by fetching it from the remote with "fetch". It's still not at all nicely symmetrical with how "git push" works, as discussed in the updated pull-fetch-param.txt documentation, but this change brings them more into line with one another. I don't think there's any reason "fetch" couldn't fully converge with the behavior used by "push", but that's a topic for another change. One of the tests added in 31b808a ("clone --single: limit the fetch refspec to fetched branch", 2012-09-20) is being changed to use --force where a clone would clobber a tag. This changes nothing about the existing behavior of the test. 1. https://public-inbox.org/git/20111123221658.GA22313@sigill.intra.peff.net/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
79ce337 to
4a68b95
Compare
Member
Author
|
/submit |
|
An error occurred while submitting: Error: git tag -F - -a pr-1/dscho/branch-diff-v3 4a68b95 failed: 128, |
Member
Author
|
/submit |
|
Submitted as pull.1.v3.git.gitgitgadget@gmail.com |
ebf3fea to
d4e27c6
Compare
5a2cf0c to
d8498fb
Compare
Member
Author
|
/submit |
|
Submitted as pull.1.v4.git.gitgitgadget@gmail.com |
gitgitgadget bot
pushed a commit
that referenced
this pull request
Jul 23, 2025
find_cfg_ent() allocates a struct reflog_expire_entry_option via
FLEX_ALLOC_MEM and inserts it into a linked list in the
reflog_expire_options structure. The entries in this list are never
freed, resulting in a leak in cmd_reflog_expire and the gc reflog expire
maintenance task:
Direct leak of 39 byte(s) in 1 object(s) allocated from:
#0 0x7ff975ee6883 in calloc (/lib64/libasan.so.8+0xe6883)
#1 0x0000010edada in xcalloc ../wrapper.c:154
#2 0x000000df0898 in find_cfg_ent ../reflog.c:28
#3 0x000000df0898 in reflog_expire_config ../reflog.c:70
#4 0x00000095c451 in configset_iter ../config.c:2116
#5 0x0000006d29e7 in git_config ../config.h:724
#6 0x0000006d29e7 in cmd_reflog_expire ../builtin/reflog.c:205
#7 0x0000006d504c in cmd_reflog ../builtin/reflog.c:419
#8 0x0000007e4054 in run_builtin ../git.c:480
#9 0x0000007e4054 in handle_builtin ../git.c:746
#10 0x0000007e8a35 in run_argv ../git.c:813
#11 0x0000007e8a35 in cmd_main ../git.c:953
#12 0x000000441e8f in main ../common-main.c:9
#13 0x7ff9754115f4 in __libc_start_call_main (/lib64/libc.so.6+0x35f4)
#14 0x7ff9754116a7 in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x36a7)
#15 0x000000444184 in _start (/home/jekeller/libexec/git-core/git+0x444184)
Close this leak by adding a reflog_clear_expire_config() function which
iterates the linked list and frees its elements. Call it upon exit of
cmd_reflog_expire() and reflog_expire_condition().
Add a basic test which covers this leak. While at it, cover the
functionality from commit commit 3cb22b8 (Per-ref reflog expiry
configuration, 2008-06-15). We've had this support for years, but lacked
any tests.
Co-developed-by: Jeff King <peff@peff.net>
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitgitgadget bot
pushed a commit
that referenced
this pull request
Aug 28, 2025
The fill_packs_from_midx() method was refactored in fcb2205 (midx: implement support for writing incremental MIDX chains, 2024-08-06) to allow for preferred packfiles and incremental multi-pack-indexes. However, this led to some conditions that can cause improperly initialized memory in the context's list of packfiles. The conditions caring about the preferred pack name or the incremental flag are currently necessary to load a packfile. But the context is still being populated with pack_info structs based on the packfile array for the existing multi-pack-index even if prepare_midx_pack() isn't called. Add a new test that breaks under --stress when compiled with SANITIZE=address. The chosen number of 100 packfiles was selected to get the --stress output to fail about 50% of the time, while 50 packfiles could not get a failure in most --stress runs. This test has a very minor check at the end confirming only one packfile remaining. The failing nature of this test actually relies on auto-GC cleaning up some packfiles during the creation of the commits, as tests setting gc.auto to zero make the packfile count match the number of added commits but also avoids hitting the memory issue. The test case is marked as EXPENSIVE not only because of the number of packfiles it creates, but because some CI environments were reporting errors during the test that I could not reproduce, specifically around being unable to open the packfiles or their pack-indexes. When it fails under SANITIZE=address, it provides the following error: AddressSanitizer:DEADLYSIGNAL ================================================================= ==3263517==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000027 ==3263517==The signal is caused by a READ memory access. ==3263517==Hint: address points to the zero page. #0 0x562d5d82d1fb in close_pack_windows packfile.c:299 #1 0x562d5d82d3ab in close_pack packfile.c:354 #2 0x562d5d7bfdb4 in write_midx_internal midx-write.c:1490 #3 0x562d5d7c7aec in midx_repack midx-write.c:1795 #4 0x562d5d46fff6 in cmd_multi_pack_index builtin/multi-pack-index.c:305 ... This failure stack trace is disconnected from the real fix because it the bad pointers are accessed later when closing the packfiles from the context. There are a few different aspects to this fix that are worth noting: 1. We return to the previous behavior of fill_packs_from_midx to not rely on the incremental flag or existence of a preferred pack. 2. The behavior to scan all layers of an incremental midx is kept, so this is not a full revert of the change. 3. We skip allocating more room in the pack_info array if the pack fails prepare_midx_pack(). 4. The method has always returned 0 for success and 1 for failure, but the condition checking for error added a check for a negative result for failure, so that is now updated. 5. The call to open_pack_index() is removed, but this is needed later in the case of a preferred pack. That call is moved to immediately before its result is needed (checking for the object count). Signed-off-by: Derrick Stolee <stolee@gmail.com>
gitgitgadget bot
pushed a commit
that referenced
this pull request
Aug 30, 2025
The fill_packs_from_midx() method was refactored in fcb2205 (midx: implement support for writing incremental MIDX chains, 2024-08-06) to allow for preferred packfiles and incremental multi-pack-indexes. However, this led to some conditions that can cause improperly initialized memory in the context's list of packfiles. The conditions caring about the preferred pack name or the incremental flag are currently necessary to load a packfile. But the context is still being populated with pack_info structs based on the packfile array for the existing multi-pack-index even if prepare_midx_pack() isn't called. Add a new test that breaks under --stress when compiled with SANITIZE=address. The chosen number of 100 packfiles was selected to get the --stress output to fail about 50% of the time, while 50 packfiles could not get a failure in most --stress runs. The test case is marked as EXPENSIVE not only because of the number of packfiles it creates, but because some CI environments were reporting errors during the test that I could not reproduce, specifically around being unable to open the packfiles or their pack-indexes. When it fails under SANITIZE=address, it provides the following error: AddressSanitizer:DEADLYSIGNAL ================================================================= ==3263517==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000027 ==3263517==The signal is caused by a READ memory access. ==3263517==Hint: address points to the zero page. #0 0x562d5d82d1fb in close_pack_windows packfile.c:299 #1 0x562d5d82d3ab in close_pack packfile.c:354 #2 0x562d5d7bfdb4 in write_midx_internal midx-write.c:1490 #3 0x562d5d7c7aec in midx_repack midx-write.c:1795 #4 0x562d5d46fff6 in cmd_multi_pack_index builtin/multi-pack-index.c:305 ... This failure stack trace is disconnected from the real fix because the bad pointers are accessed later when closing the packfiles from the context. There are a few different aspects to this fix that are worth noting: 1. We return to the previous behavior of fill_packs_from_midx to not rely on the incremental flag or existence of a preferred pack. 2. The behavior to scan all layers of an incremental midx is kept, so this is not a full revert of the change. 3. We skip allocating more room in the pack_info array if the pack fails prepare_midx_pack(). 4. The method has always returned 0 for success and 1 for failure, but the condition checking for error added a check for a negative result for failure, so that is now updated. 5. The call to open_pack_index() is removed, but this is needed later in the case of a preferred pack. That call is moved to immediately before its result is needed (checking for the object count). Signed-off-by: Derrick Stolee <stolee@gmail.com>
gitgitgadget bot
pushed a commit
that referenced
this pull request
Sep 2, 2025
The fill_packs_from_midx() method was refactored in fcb2205 (midx: implement support for writing incremental MIDX chains, 2024-08-06) to allow for preferred packfiles and incremental multi-pack-indexes. However, this led to some conditions that can cause improperly initialized memory in the context's list of packfiles. The conditions caring about the preferred pack name or the incremental flag are currently necessary to load a packfile. But the context is still being populated with pack_info structs based on the packfile array for the existing multi-pack-index even if prepare_midx_pack() isn't called. Add a new test that breaks under --stress when compiled with SANITIZE=address. The chosen number of 100 packfiles was selected to get the --stress output to fail about 50% of the time, while 50 packfiles could not get a failure in most --stress runs. The test case is marked as EXPENSIVE not only because of the number of packfiles it creates, but because some CI environments were reporting errors during the test that I could not reproduce, specifically around being unable to open the packfiles or their pack-indexes. When it fails under SANITIZE=address, it provides the following error: AddressSanitizer:DEADLYSIGNAL ================================================================= ==3263517==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000027 ==3263517==The signal is caused by a READ memory access. ==3263517==Hint: address points to the zero page. #0 0x562d5d82d1fb in close_pack_windows packfile.c:299 #1 0x562d5d82d3ab in close_pack packfile.c:354 #2 0x562d5d7bfdb4 in write_midx_internal midx-write.c:1490 #3 0x562d5d7c7aec in midx_repack midx-write.c:1795 #4 0x562d5d46fff6 in cmd_multi_pack_index builtin/multi-pack-index.c:305 ... This failure stack trace is disconnected from the real fix because the bad pointers are accessed later when closing the packfiles from the context. There are a few different aspects to this fix that are worth noting: 1. We return to the previous behavior of fill_packs_from_midx to not rely on the incremental flag or existence of a preferred pack. 2. The behavior to scan all layers of an incremental midx is kept, so this is not a full revert of the change. 3. We skip allocating more room in the pack_info array if the pack fails prepare_midx_pack(). 4. The method has always returned 0 for success and 1 for failure, but the condition checking for error added a check for a negative result for failure, so that is now updated. 5. The call to open_pack_index() is removed, but this is needed later in the case of a preferred pack. That call is moved to immediately before its result is needed (checking for the object count). Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitgitgadget bot
pushed a commit
that referenced
this pull request
Sep 5, 2025
The fill_packs_from_midx() method was refactored in fcb2205 (midx: implement support for writing incremental MIDX chains, 2024-08-06) to allow for preferred packfiles and incremental multi-pack-indexes. However, this led to some conditions that can cause improperly initialized memory in the context's list of packfiles. The conditions caring about the preferred pack name or the incremental flag are currently necessary to load a packfile. But the context is still being populated with pack_info structs based on the packfile array for the existing multi-pack-index even if prepare_midx_pack() isn't called. Add a new test that breaks under --stress when compiled with SANITIZE=address. The chosen number of 100 packfiles was selected to get the --stress output to fail about 50% of the time, while 50 packfiles could not get a failure in most --stress runs. The test case is marked as EXPENSIVE not only because of the number of packfiles it creates, but because some CI environments were reporting errors during the test that I could not reproduce, specifically around being unable to open the packfiles or their pack-indexes. When it fails under SANITIZE=address, it provides the following error: AddressSanitizer:DEADLYSIGNAL ================================================================= ==3263517==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000027 ==3263517==The signal is caused by a READ memory access. ==3263517==Hint: address points to the zero page. #0 0x562d5d82d1fb in close_pack_windows packfile.c:299 #1 0x562d5d82d3ab in close_pack packfile.c:354 #2 0x562d5d7bfdb4 in write_midx_internal midx-write.c:1490 #3 0x562d5d7c7aec in midx_repack midx-write.c:1795 #4 0x562d5d46fff6 in cmd_multi_pack_index builtin/multi-pack-index.c:305 ... This failure stack trace is disconnected from the real fix because the bad pointers are accessed later when closing the packfiles from the context. There are a few different aspects to this fix that are worth noting: 1. We return to the previous behavior of fill_packs_from_midx to not rely on the incremental flag or existence of a preferred pack. 2. The behavior to scan all layers of an incremental midx is kept, so this is not a full revert of the change. 3. We skip allocating more room in the pack_info array if the pack fails prepare_midx_pack(). 4. The method has always returned 0 for success and 1 for failure, but the condition checking for error added a check for a negative result for failure, so that is now updated. 5. The call to open_pack_index() is removed, but this is needed later in the case of a preferred pack. That call is moved to immediately before its result is needed (checking for the object count). Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitgitgadget bot
pushed a commit
that referenced
this pull request
Dec 7, 2025
When performing auto-maintenance we check whether commit graphs need to
be generated by counting the number of commits that are reachable by any
reference, but not covered by a commit graph. This search is performed
by iterating through all references and then doing a depth-first search
until we have found enough commits that are not present in the commit
graph.
This logic has a memory leak though:
Direct leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x55555562e433 in malloc (git+0xda433)
#1 0x555555964322 in do_xmalloc ../wrapper.c:55:8
#2 0x5555559642e6 in xmalloc ../wrapper.c:76:9
#3 0x55555579bf29 in commit_list_append ../commit.c:1872:35
#4 0x55555569f160 in dfs_on_ref ../builtin/gc.c:1165:4
#5 0x5555558c33fd in do_for_each_ref_iterator ../refs/iterator.c:431:12
#6 0x5555558af520 in do_for_each_ref ../refs.c:1828:9
#7 0x5555558ac317 in refs_for_each_ref ../refs.c:1833:9
#8 0x55555569e207 in should_write_commit_graph ../builtin/gc.c:1188:11
#9 0x55555569c915 in maintenance_is_needed ../builtin/gc.c:3492:8
#10 0x55555569b76a in cmd_maintenance ../builtin/gc.c:3542:9
#11 0x55555575166a in run_builtin ../git.c:506:11
#12 0x5555557502f0 in handle_builtin ../git.c:779:9
#13 0x555555751127 in run_argv ../git.c:862:4
#14 0x55555575007b in cmd_main ../git.c:984:19
#15 0x5555557523aa in main ../common-main.c:9:11
#16 0x7ffff7a2a4d7 in __libc_start_call_main (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a4d7) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
#17 0x7ffff7a2a59a in __libc_start_main@GLIBC_2.2.5 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a59a) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
#18 0x5555555f0934 in _start (git+0x9c934)
The root cause of this memory leak is our use of `commit_list_append()`.
This function expects as parameters the item to append and the _tail_ of
the list to append. This tail will then be overwritten with the new tail
of the list so that it can be used in subsequent calls. But we call it
with `commit_list_append(parent->item, &stack)`, so we end up losing
everything but the new item.
This issue only surfaces when counting merge commits. Next to being a
memory leak, it also shows that we're in fact miscounting as we only
respect children of the last parent. All previous parents are discarded,
so their children will be disregarded unless they are hit via another
reference.
While crafting a test case for the issue I was puzzled that I couldn't
establish the proper border at which the auto-condition would be
fulfilled. As it turns out, there's another bug: if an object is at the
tip of any reference we don't mark it as seen. Consequently, if it is
reachable via any other reference, we'd count that object twice.
Fix both of these bugs so that we properly count objects without leaking
any memory.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitgitgadget bot
pushed a commit
that referenced
this pull request
Dec 7, 2025
In the next commit we are about to move the packfile store into the ODB
source so that we have one store per source. This will lead to a memory
leak in the following commit when reading data from a submodule via
git-grep(1):
Direct leak of 192 byte(s) in 1 object(s) allocated from:
#0 0x55555562e726 in calloc (git+0xda726)
#1 0x555555964734 in xcalloc ../wrapper.c:154:8
#2 0x555555835136 in load_multi_pack_index_one ../midx.c:135:2
#3 0x555555834fd6 in load_multi_pack_index ../midx.c:382:6
#4 0x5555558365b6 in prepare_multi_pack_index_one ../midx.c:716:17
#5 0x55555586c605 in packfile_store_prepare ../packfile.c:1103:3
#6 0x55555586c90c in packfile_store_reprepare ../packfile.c:1118:2
#7 0x5555558546b3 in odb_reprepare ../odb.c:1106:2
#8 0x5555558539e4 in do_oid_object_info_extended ../odb.c:715:4
#9 0x5555558533d1 in odb_read_object_info_extended ../odb.c:862:8
#10 0x5555558540bd in odb_read_object ../odb.c:920:6
#11 0x55555580a330 in grep_source_load_oid ../grep.c:1934:12
#12 0x55555580a13a in grep_source_load ../grep.c:1986:10
#13 0x555555809103 in grep_source_is_binary ../grep.c:2014:7
#14 0x555555807574 in grep_source_1 ../grep.c:1625:8
#15 0x555555807322 in grep_source ../grep.c:1837:10
#16 0x5555556a5c58 in run ../builtin/grep.c:208:10
#17 0x55555562bb42 in void* ThreadStartFunc<false>(void*) lsan_interceptors.cpp.o
#18 0x7ffff7a9a979 in start_thread (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x9a979) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
#19 0x7ffff7b22d2b in __GI___clone3 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x122d2b) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
The root caues of this leak is the way we set up and release the
submodule:
1. We use `repo_submodule_init()` to initialize a new repository. This
repository is stored in `repos_to_free`.
2. We now read data from the submodule repository.
3. We then call `repo_clear()` on the submodule repositories.
4. `repo_clear()` calls `odb_free()`.
5. `odb_free()` calls `odb_free_sources()` followed by `odb_close()`.
The issue here is the 5th step: we call `odb_free_sources()` _before_ we
call `odb_close()`. But `odb_free_sources()` already frees all sources,
so the logic that closes them in `odb_close()` now becomes a no-op. As a
consequence, we never explicitly close sources at all.
Fix the leak by closing the store before we free the sources.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitgitgadget bot
pushed a commit
that referenced
this pull request
Dec 7, 2025
The MIDX file format currently requires that pack files be identified by
the lexicographic ordering of their names (that is, a pack having a
checksum beginning with "abc" would have a numeric pack_int_id which is
smaller than the same value for a pack beginning with "bcd").
As a result, it is impossible to combine adjacent MIDX layers together
without permuting bits from bitmaps that are in more recent layer(s).
To see why, consider the following example:
| packs | preferred pack
--------+-------------+---------------
MIDX #0 | { X, Y, Z } | Y
MIDX #1 | { A, B, C } | B
MIDX #2 | { D, E, F } | D
, where MIDX #2's base MIDX is MIDX #1, and so on. Suppose that we want
to combine MIDX layers #0 and #1, to create a new layer #0' containing
the packs from both layers. With the original three MIDX layers, objects
are laid out in the bitmap in the order they appear in their source
pack, and the packs themselves are arranged according to the pseudo-pack
order. In this case, that ordering is Y, X, Z, B, A, C.
But recall that the pseudo-pack ordering is defined by the order that
packs appear in the MIDX, with the exception of the preferred pack,
which sorts ahead of all other packs regardless of its position within
the MIDX. In the above example, that means that pack 'Y' could be placed
anywhere (so long as it is designated as preferred), however, all other
packs must be placed in the location listed above.
Because that ordering isn't sorted lexicographically, it is impossible
to compact MIDX layers in the above configuration without permuting the
object-to-bit-position mapping. Changing this mapping would affect all
bitmaps belonging to newer layers, rendering the bitmaps associated with
MIDX #2 unreadable.
One of the goals of MIDX compaction is that we are able to shrink the
length of the MIDX chain *without* invalidating bitmaps that belong to
newer layers, and the lexicographic ordering constraint is at odds with
this goal.
However, packs do not *need* to be lexicographically ordered within the
MIDX. As far as I can gather, the only reason they are sorted lexically
is to make it possible to perform a binary search over the pack names in
a MIDX, necessary to make `midx_contains_pack()`'s performance
logarithmic in the number of packs rather than linear.
Relax this constraint by allowing MIDX writes to proceed with packs that
are not arranged in lexicographic order. `midx_contains_pack()` will
lazily instantiate a `pack_names_sorted` array on the MIDX, which will
be used to implement the binary search over pack names.
Note that this produces MIDXs which may be incompatible with earlier
versions of Git that have stricter requirements on the layout of packs
within a MIDX. This patch does *not* modify the version number of the
MIDX format, since existing versions of Git already know to gracefully
ignore a MIDX with packs that appear out-of-order.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitgitgadget bot
pushed a commit
that referenced
this pull request
Dec 7, 2025
When pushing to a set of remotes using a nickname for the group, the
client initializes the connection to each remote, talks to the
remote and reads and parses capabilities line, and holds the
capabilities in a file-scope static variable server_capabilities_v1.
There are a few other such file-scope static variables, and these
connections cannot be parallelized until they are refactored to a
structure that keeps track of active connections.
Which is *not* the theme of this patch ;-)
For a single connection, the server_capabilities_v1 variable is
initialized to NULL (at the program initialization), populated when
we talk to the other side, used to look up capabilities of the other
sdie possible multiple times, and the memory is held by the variable
until program exit, without leaking. When talking to multiple remotes,
however, the server capabilities from the second connection overwrites
without freeing the one from the first connection, which leaks.
==1080970==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 421 byte(s) in 2 object(s) allocated from:
#0 0x5615305f849e in strdup (/home/gitster/g/git-jch/bin/bin/git+0x2b349e) (BuildId: 54d149994c9e85374831958f694bd0aa3b8b1e26)
#1 0x561530e76cc4 in xstrdup /home/gitster/w/build/wrapper.c:43:14
#2 0x5615309cd7fa in process_capabilities /home/gitster/w/build/connect.c:243:27
#3 0x5615309cd502 in get_remote_heads /home/gitster/w/build/connect.c:366:4
#4 0x561530e2cb0b in handshake /home/gitster/w/build/transport.c:372:3
#5 0x561530e29ed7 in get_refs_via_connect /home/gitster/w/build/transport.c:398:9
#6 0x561530e26464 in transport_push /home/gitster/w/build/transport.c:1421:16
#7 0x561530800bec in push_with_options /home/gitster/w/build/builtin/push.c:387:8
#8 0x5615307ffb99 in do_push /home/gitster/w/build/builtin/push.c:442:7
#9 0x5615307fe926 in cmd_push /home/gitster/w/build/builtin/push.c:664:7
#10 0x56153065673f in run_builtin /home/gitster/w/build/git.c:506:11
#11 0x56153065342f in handle_builtin /home/gitster/w/build/git.c:779:9
#12 0x561530655b89 in run_argv /home/gitster/w/build/git.c:862:4
#13 0x561530652cba in cmd_main /home/gitster/w/build/git.c:984:19
#14 0x5615308dda0a in main /home/gitster/w/build/common-main.c:9:11
#15 0x7f051651bca7 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
SUMMARY: AddressSanitizer: 421 byte(s) leaked in 2 allocation(s).
Free the capablities data for the previous server before overwriting
it with the next server to plug this leak.
The added test fails without the freeing with SANITIZE=leak; I
somehow couldn't get it fail reliably with SANITIZE=leak,address
though.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitgitgadget bot
pushed a commit
that referenced
this pull request
Dec 9, 2025
When pushing to a set of remotes using a nickname for the group, the
client initializes the connection to each remote, talks to the
remote and reads and parses capabilities line, and holds the
capabilities in a file-scope static variable server_capabilities_v1.
There are a few other such file-scope static variables, and these
connections cannot be parallelized until they are refactored to a
structure that keeps track of active connections.
Which is *not* the theme of this patch ;-)
For a single connection, the server_capabilities_v1 variable is
initialized to NULL (at the program initialization), populated when
we talk to the other side, used to look up capabilities of the other
side possibly multiple times, and the memory is held by the variable
until program exit, without leaking. When talking to multiple remotes,
however, the server capabilities from the second connection overwrites
without freeing the one from the first connection, which leaks.
==1080970==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 421 byte(s) in 2 object(s) allocated from:
#0 0x5615305f849e in strdup (/home/gitster/g/git-jch/bin/bin/git+0x2b349e) (BuildId: 54d149994c9e85374831958f694bd0aa3b8b1e26)
#1 0x561530e76cc4 in xstrdup /home/gitster/w/build/wrapper.c:43:14
#2 0x5615309cd7fa in process_capabilities /home/gitster/w/build/connect.c:243:27
#3 0x5615309cd502 in get_remote_heads /home/gitster/w/build/connect.c:366:4
#4 0x561530e2cb0b in handshake /home/gitster/w/build/transport.c:372:3
#5 0x561530e29ed7 in get_refs_via_connect /home/gitster/w/build/transport.c:398:9
#6 0x561530e26464 in transport_push /home/gitster/w/build/transport.c:1421:16
#7 0x561530800bec in push_with_options /home/gitster/w/build/builtin/push.c:387:8
#8 0x5615307ffb99 in do_push /home/gitster/w/build/builtin/push.c:442:7
#9 0x5615307fe926 in cmd_push /home/gitster/w/build/builtin/push.c:664:7
#10 0x56153065673f in run_builtin /home/gitster/w/build/git.c:506:11
#11 0x56153065342f in handle_builtin /home/gitster/w/build/git.c:779:9
#12 0x561530655b89 in run_argv /home/gitster/w/build/git.c:862:4
#13 0x561530652cba in cmd_main /home/gitster/w/build/git.c:984:19
#14 0x5615308dda0a in main /home/gitster/w/build/common-main.c:9:11
#15 0x7f051651bca7 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
SUMMARY: AddressSanitizer: 421 byte(s) leaked in 2 allocation(s).
Free the capablities data for the previous server before overwriting
it with the next server to plug this leak.
The added test fails without the freeing with SANITIZE=leak; I
somehow couldn't get it fail reliably with SANITIZE=leak,address
though.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitgitgadget bot
pushed a commit
that referenced
this pull request
Dec 11, 2025
When performing auto-maintenance we check whether commit graphs need to
be generated by counting the number of commits that are reachable by any
reference, but not covered by a commit graph. This search is performed
by iterating through all references and then doing a depth-first search
until we have found enough commits that are not present in the commit
graph.
This logic has a memory leak though:
Direct leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x55555562e433 in malloc (git+0xda433)
#1 0x555555964322 in do_xmalloc ../wrapper.c:55:8
#2 0x5555559642e6 in xmalloc ../wrapper.c:76:9
#3 0x55555579bf29 in commit_list_append ../commit.c:1872:35
#4 0x55555569f160 in dfs_on_ref ../builtin/gc.c:1165:4
#5 0x5555558c33fd in do_for_each_ref_iterator ../refs/iterator.c:431:12
#6 0x5555558af520 in do_for_each_ref ../refs.c:1828:9
#7 0x5555558ac317 in refs_for_each_ref ../refs.c:1833:9
#8 0x55555569e207 in should_write_commit_graph ../builtin/gc.c:1188:11
#9 0x55555569c915 in maintenance_is_needed ../builtin/gc.c:3492:8
#10 0x55555569b76a in cmd_maintenance ../builtin/gc.c:3542:9
#11 0x55555575166a in run_builtin ../git.c:506:11
#12 0x5555557502f0 in handle_builtin ../git.c:779:9
#13 0x555555751127 in run_argv ../git.c:862:4
#14 0x55555575007b in cmd_main ../git.c:984:19
#15 0x5555557523aa in main ../common-main.c:9:11
#16 0x7ffff7a2a4d7 in __libc_start_call_main (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a4d7) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
#17 0x7ffff7a2a59a in __libc_start_main@GLIBC_2.2.5 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a59a) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
#18 0x5555555f0934 in _start (git+0x9c934)
The root cause of this memory leak is our use of `commit_list_append()`.
This function expects as parameters the item to append and the _tail_ of
the list to append. This tail will then be overwritten with the new tail
of the list so that it can be used in subsequent calls. But we call it
with `commit_list_append(parent->item, &stack)`, so we end up losing
everything but the new item.
This issue only surfaces when counting merge commits. Next to being a
memory leak, it also shows that we're in fact miscounting as we only
respect children of the last parent. All previous parents are discarded,
so their children will be disregarded unless they are hit via another
reference.
While crafting a test case for the issue I was puzzled that I couldn't
establish the proper border at which the auto-condition would be
fulfilled. As it turns out, there's another bug: if an object is at the
tip of any reference we don't mark it as seen. Consequently, if it is
reachable via any other reference, we'd count that object twice.
Fix both of these bugs so that we properly count objects without leaking
any memory.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitgitgadget bot
pushed a commit
that referenced
this pull request
Dec 11, 2025
It is possible to hit a memory leak when reading data from a submodule
via git-grep(1):
Direct leak of 192 byte(s) in 1 object(s) allocated from:
#0 0x55555562e726 in calloc (git+0xda726)
#1 0x555555964734 in xcalloc ../wrapper.c:154:8
#2 0x555555835136 in load_multi_pack_index_one ../midx.c:135:2
#3 0x555555834fd6 in load_multi_pack_index ../midx.c:382:6
#4 0x5555558365b6 in prepare_multi_pack_index_one ../midx.c:716:17
#5 0x55555586c605 in packfile_store_prepare ../packfile.c:1103:3
#6 0x55555586c90c in packfile_store_reprepare ../packfile.c:1118:2
#7 0x5555558546b3 in odb_reprepare ../odb.c:1106:2
#8 0x5555558539e4 in do_oid_object_info_extended ../odb.c:715:4
#9 0x5555558533d1 in odb_read_object_info_extended ../odb.c:862:8
#10 0x5555558540bd in odb_read_object ../odb.c:920:6
#11 0x55555580a330 in grep_source_load_oid ../grep.c:1934:12
#12 0x55555580a13a in grep_source_load ../grep.c:1986:10
#13 0x555555809103 in grep_source_is_binary ../grep.c:2014:7
#14 0x555555807574 in grep_source_1 ../grep.c:1625:8
#15 0x555555807322 in grep_source ../grep.c:1837:10
#16 0x5555556a5c58 in run ../builtin/grep.c:208:10
#17 0x55555562bb42 in void* ThreadStartFunc<false>(void*) lsan_interceptors.cpp.o
#18 0x7ffff7a9a979 in start_thread (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x9a979) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
#19 0x7ffff7b22d2b in __GI___clone3 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x122d2b) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
The root caues of this leak is the way we set up and release the
submodule:
1. We use `repo_submodule_init()` to initialize a new repository. This
repository is stored in `repos_to_free`.
2. We now read data from the submodule repository.
3. We then call `repo_clear()` on the submodule repositories.
4. `repo_clear()` calls `odb_free()`.
5. `odb_free()` calls `odb_free_sources()` followed by `odb_close()`.
The issue here is the 5th step: we call `odb_free_sources()` _before_ we
call `odb_close()`. But `odb_free_sources()` already frees all sources,
so the logic that closes them in `odb_close()` now becomes a no-op. As a
consequence, we never explicitly close sources at all.
Fix the leak by closing the store before we free the sources.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitgitgadget bot
pushed a commit
that referenced
this pull request
Jan 7, 2026
When performing auto-maintenance we check whether commit graphs need to
be generated by counting the number of commits that are reachable by any
reference, but not covered by a commit graph. This search is performed
by iterating through all references and then doing a depth-first search
until we have found enough commits that are not present in the commit
graph.
This logic has a memory leak though:
Direct leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x55555562e433 in malloc (git+0xda433)
#1 0x555555964322 in do_xmalloc ../wrapper.c:55:8
#2 0x5555559642e6 in xmalloc ../wrapper.c:76:9
#3 0x55555579bf29 in commit_list_append ../commit.c:1872:35
#4 0x55555569f160 in dfs_on_ref ../builtin/gc.c:1165:4
#5 0x5555558c33fd in do_for_each_ref_iterator ../refs/iterator.c:431:12
#6 0x5555558af520 in do_for_each_ref ../refs.c:1828:9
#7 0x5555558ac317 in refs_for_each_ref ../refs.c:1833:9
#8 0x55555569e207 in should_write_commit_graph ../builtin/gc.c:1188:11
#9 0x55555569c915 in maintenance_is_needed ../builtin/gc.c:3492:8
#10 0x55555569b76a in cmd_maintenance ../builtin/gc.c:3542:9
#11 0x55555575166a in run_builtin ../git.c:506:11
#12 0x5555557502f0 in handle_builtin ../git.c:779:9
#13 0x555555751127 in run_argv ../git.c:862:4
#14 0x55555575007b in cmd_main ../git.c:984:19
#15 0x5555557523aa in main ../common-main.c:9:11
#16 0x7ffff7a2a4d7 in __libc_start_call_main (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a4d7) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
#17 0x7ffff7a2a59a in __libc_start_main@GLIBC_2.2.5 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a59a) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
#18 0x5555555f0934 in _start (git+0x9c934)
The root cause of this memory leak is our use of `commit_list_append()`.
This function expects as parameters the item to append and the _tail_ of
the list to append. This tail will then be overwritten with the new tail
of the list so that it can be used in subsequent calls. But we call it
with `commit_list_append(parent->item, &stack)`, so we end up losing
everything but the new item.
This issue only surfaces when counting merge commits. Next to being a
memory leak, it also shows that we're in fact miscounting as we only
respect children of the last parent. All previous parents are discarded,
so their children will be disregarded unless they are hit via another
reference.
While crafting a test case for the issue I was puzzled that I couldn't
establish the proper border at which the auto-condition would be
fulfilled. As it turns out, there's another bug: if an object is at the
tip of any reference we don't mark it as seen. Consequently, if it is
the tip of or reachable via another ref, we'd count that object multiple
times.
Fix both of these bugs so that we properly count objects without leaking
any memory.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitgitgadget bot
pushed a commit
that referenced
this pull request
Jan 7, 2026
It is possible to hit a memory leak when reading data from a submodule
via git-grep(1):
Direct leak of 192 byte(s) in 1 object(s) allocated from:
#0 0x55555562e726 in calloc (git+0xda726)
#1 0x555555964734 in xcalloc ../wrapper.c:154:8
#2 0x555555835136 in load_multi_pack_index_one ../midx.c:135:2
#3 0x555555834fd6 in load_multi_pack_index ../midx.c:382:6
#4 0x5555558365b6 in prepare_multi_pack_index_one ../midx.c:716:17
#5 0x55555586c605 in packfile_store_prepare ../packfile.c:1103:3
#6 0x55555586c90c in packfile_store_reprepare ../packfile.c:1118:2
#7 0x5555558546b3 in odb_reprepare ../odb.c:1106:2
#8 0x5555558539e4 in do_oid_object_info_extended ../odb.c:715:4
#9 0x5555558533d1 in odb_read_object_info_extended ../odb.c:862:8
#10 0x5555558540bd in odb_read_object ../odb.c:920:6
#11 0x55555580a330 in grep_source_load_oid ../grep.c:1934:12
#12 0x55555580a13a in grep_source_load ../grep.c:1986:10
#13 0x555555809103 in grep_source_is_binary ../grep.c:2014:7
#14 0x555555807574 in grep_source_1 ../grep.c:1625:8
#15 0x555555807322 in grep_source ../grep.c:1837:10
#16 0x5555556a5c58 in run ../builtin/grep.c:208:10
#17 0x55555562bb42 in void* ThreadStartFunc<false>(void*) lsan_interceptors.cpp.o
#18 0x7ffff7a9a979 in start_thread (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x9a979) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
#19 0x7ffff7b22d2b in __GI___clone3 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x122d2b) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
The root caues of this leak is the way we set up and release the
submodule:
1. We use `repo_submodule_init()` to initialize a new repository. This
repository is stored in `repos_to_free`.
2. We now read data from the submodule repository.
3. We then call `repo_clear()` on the submodule repositories.
4. `repo_clear()` calls `odb_free()`.
5. `odb_free()` calls `odb_free_sources()` followed by `odb_close()`.
The issue here is the 5th step: we call `odb_free_sources()` _before_ we
call `odb_close()`. But `odb_free_sources()` already frees all sources,
so the logic that closes them in `odb_close()` now becomes a no-op. As a
consequence, we never explicitly close sources at all.
Fix the leak by closing the store before we free the sources.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitgitgadget bot
pushed a commit
that referenced
this pull request
Jan 14, 2026
The MIDX file format currently requires that pack files be identified by
the lexicographic ordering of their names (that is, a pack having a
checksum beginning with "abc" would have a numeric pack_int_id which is
smaller than the same value for a pack beginning with "bcd").
As a result, it is impossible to combine adjacent MIDX layers together
without permuting bits from bitmaps that are in more recent layer(s).
To see why, consider the following example:
| packs | preferred pack
--------+-------------+---------------
MIDX #0 | { X, Y, Z } | Y
MIDX #1 | { A, B, C } | B
MIDX #2 | { D, E, F } | D
, where MIDX #2's base MIDX is MIDX #1, and so on. Suppose that we want
to combine MIDX layers #0 and #1, to create a new layer #0' containing
the packs from both layers. With the original three MIDX layers, objects
are laid out in the bitmap in the order they appear in their source
pack, and the packs themselves are arranged according to the pseudo-pack
order. In this case, that ordering is Y, X, Z, B, A, C.
But recall that the pseudo-pack ordering is defined by the order that
packs appear in the MIDX, with the exception of the preferred pack,
which sorts ahead of all other packs regardless of its position within
the MIDX. In the above example, that means that pack 'Y' could be placed
anywhere (so long as it is designated as preferred), however, all other
packs must be placed in the location listed above.
Because that ordering isn't sorted lexicographically, it is impossible
to compact MIDX layers in the above configuration without permuting the
object-to-bit-position mapping. Changing this mapping would affect all
bitmaps belonging to newer layers, rendering the bitmaps associated with
MIDX #2 unreadable.
One of the goals of MIDX compaction is that we are able to shrink the
length of the MIDX chain *without* invalidating bitmaps that belong to
newer layers, and the lexicographic ordering constraint is at odds with
this goal.
However, packs do not *need* to be lexicographically ordered within the
MIDX. As far as I can gather, the only reason they are sorted lexically
is to make it possible to perform a binary search over the pack names in
a MIDX, necessary to make `midx_contains_pack()`'s performance
logarithmic in the number of packs rather than linear.
Relax this constraint by allowing MIDX writes to proceed with packs that
are not arranged in lexicographic order. `midx_contains_pack()` will
lazily instantiate a `pack_names_sorted` array on the MIDX, which will
be used to implement the binary search over pack names.
Because this change produces MIDXs which may not be correctly read with
external tools or older versions of Git. Though older versions of Git
know how to gracefully degrade and ignore any MIDX(s) they consider
corrupt, external tools may not be as robust. To avoid unintentionally
breaking any such tools, guard this change behind a version bump in the
MIDX's on-disk format.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitgitgadget bot
pushed a commit
that referenced
this pull request
Jan 26, 2026
The is_work_tree_watched() function in fsmonitor-watchman.sample has
two bugs:
1. Wrong variable in error check: After calling watchman_clock(), the
result is stored in $o, but the code checks $output->{error} instead
of $o->{error}. This means errors from the clock command are silently
ignored.
2. Double output violates protocol: When the retry path triggers (the
directory wasn't initially watched), output_result() is called with
the "/" flag, then launch_watchman() is called recursively which
calls output_result() again. This outputs two clock tokens to stdout,
but git's fsmonitor v2 protocol expects exactly one response.
Fix #1 by checking $o->{error} after watchman_clock().
Fix #2 by removing the recursive launch_watchman() call. The "/"
"everything is dirty" flag already tells git to do a full scan, and
git will call the hook again on the next invocation with a valid clock
token.
Apply the same fixes to the test helper scripts in t/t7519/.
Signed-off-by: Paul Tarjan <github@paulisageek.com>
gitgitgadget bot
pushed a commit
that referenced
this pull request
Feb 2, 2026
For Git to recognize a directory as a Git directory, it requires the directory to contain: 1. 'HEAD' file 2. object/ directory 3. 'refs/' directory Here, #1 and #3 are part of the reference storage mechanism, specifically the files backend. Since then, newer backends such as the reftable backend have moved to using their own path ('reftable/') for storing references. But to ensure git still recognizes the directory as a Git directory, we create stubs. There are two locations we create stubs: - In 'refs/reftable-backend.c' when creating the reftable backend. - In 'clone.c' before spawning transport helpers. In a following commit, we'll add another instance. So instead of repeating the code, let's extract out this code to `refs_create_refdir_stubs()` and use it. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitgitgadget bot
pushed a commit
that referenced
this pull request
Feb 9, 2026
For Git to recognize a directory as a Git directory, it requires the directory to contain: 1. 'HEAD' file 2. 'objects/' directory 3. 'refs/' directory Here, #1 and #3 are part of the reference storage mechanism, specifically the files backend. Since then, newer backends such as the reftable backend have moved to using their own path ('reftable/') for storing references. But to ensure Git still recognizes the directory as a Git directory, we create stubs. There are two locations where we create stubs: - In 'refs/reftable-backend.c' when creating the reftable backend. - In 'clone.c' before spawning transport helpers. In a following commit, we'll add another instance. So instead of repeating the code, let's extract out this code to `refs_create_refdir_stubs()` and use it. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitgitgadget bot
pushed a commit
that referenced
this pull request
Feb 17, 2026
For Git to recognize a directory as a Git directory, it requires the directory to contain: 1. 'HEAD' file 2. 'objects/' directory 3. 'refs/' directory Here, #1 and #3 are part of the reference storage mechanism, specifically the files backend. Since then, newer backends such as the reftable backend have moved to using their own path ('reftable/') for storing references. But to ensure Git still recognizes the directory as a Git directory, we create stubs. There are two locations where we create stubs: - In 'refs/reftable-backend.c' when creating the reftable backend. - In 'clone.c' before spawning transport helpers. In a following commit, we'll add another instance. So instead of repeating the code, let's extract out this code to `refs_create_refdir_stubs()` and use it. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitgitgadget bot
pushed a commit
that referenced
this pull request
Feb 19, 2026
For Git to recognize a directory as a Git directory, it requires the directory to contain: 1. 'HEAD' file 2. 'objects/' directory 3. 'refs/' directory Here, #1 and #3 are part of the reference storage mechanism, specifically the files backend. Since then, newer backends such as the reftable backend have moved to using their own path ('reftable/') for storing references. But to ensure Git still recognizes the directory as a Git directory, we create stubs. There are two locations where we create stubs: - In 'refs/reftable-backend.c' when creating the reftable backend. - In 'clone.c' before spawning transport helpers. In a following commit, we'll add another instance. So instead of repeating the code, let's extract out this code to `refs_create_refdir_stubs()` and use it. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitgitgadget bot
pushed a commit
that referenced
this pull request
Feb 24, 2026
For Git to recognize a directory as a Git directory, it requires the directory to contain: 1. 'HEAD' file 2. 'objects/' directory 3. 'refs/' directory Here, #1 and #3 are part of the reference storage mechanism, specifically the files backend. Since then, newer backends such as the reftable backend have moved to using their own path ('reftable/') for storing references. But to ensure Git still recognizes the directory as a Git directory, we create stubs. There are two locations where we create stubs: - In 'refs/reftable-backend.c' when creating the reftable backend. - In 'clone.c' before spawning transport helpers. In a following commit, we'll add another instance. So instead of repeating the code, let's extract out this code to `refs_create_refdir_stubs()` and use it. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitgitgadget bot
pushed a commit
that referenced
this pull request
Feb 24, 2026
The MIDX file format currently requires that pack files be identified by
the lexicographic ordering of their names (that is, a pack having a
checksum beginning with "abc" would have a numeric pack_int_id which is
smaller than the same value for a pack beginning with "bcd").
As a result, it is impossible to combine adjacent MIDX layers together
without permuting bits from bitmaps that are in more recent layer(s).
To see why, consider the following example:
| packs | preferred pack
--------+-------------+---------------
MIDX #0 | { X, Y, Z } | Y
MIDX #1 | { A, B, C } | B
MIDX #2 | { D, E, F } | D
, where MIDX #2's base MIDX is MIDX #1, and so on. Suppose that we want
to combine MIDX layers #0 and #1, to create a new layer #0' containing
the packs from both layers. With the original three MIDX layers, objects
are laid out in the bitmap in the order they appear in their source
pack, and the packs themselves are arranged according to the pseudo-pack
order. In this case, that ordering is Y, X, Z, B, A, C.
But recall that the pseudo-pack ordering is defined by the order that
packs appear in the MIDX, with the exception of the preferred pack,
which sorts ahead of all other packs regardless of its position within
the MIDX. In the above example, that means that pack 'Y' could be placed
anywhere (so long as it is designated as preferred), however, all other
packs must be placed in the location listed above.
Because that ordering isn't sorted lexicographically, it is impossible
to compact MIDX layers in the above configuration without permuting the
object-to-bit-position mapping. Changing this mapping would affect all
bitmaps belonging to newer layers, rendering the bitmaps associated with
MIDX #2 unreadable.
One of the goals of MIDX compaction is that we are able to shrink the
length of the MIDX chain *without* invalidating bitmaps that belong to
newer layers, and the lexicographic ordering constraint is at odds with
this goal.
However, packs do not *need* to be lexicographically ordered within the
MIDX. As far as I can gather, the only reason they are sorted lexically
is to make it possible to perform a binary search over the pack names in
a MIDX, necessary to make `midx_contains_pack()`'s performance
logarithmic in the number of packs rather than linear.
Relax this constraint by allowing MIDX writes to proceed with packs that
are not arranged in lexicographic order. `midx_contains_pack()` will
lazily instantiate a `pack_names_sorted` array on the MIDX, which will
be used to implement the binary search over pack names.
This change produces MIDXs which may not be correctly read with external
tools or older versions of Git. Though older versions of Git know how to
gracefully degrade and ignore any MIDX(s) they consider corrupt,
external tools may not be as robust. To avoid unintentionally breaking
any such tools, guard this change behind a version bump in the MIDX's
on-disk format.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitgitgadget bot
pushed a commit
that referenced
this pull request
Feb 25, 2026
For Git to recognize a directory as a Git directory, it requires the directory to contain: 1. 'HEAD' file 2. 'objects/' directory 3. 'refs/' directory Here, #1 and #3 are part of the reference storage mechanism, specifically the files backend. Since then, newer backends such as the reftable backend have moved to using their own path ('reftable/') for storing references. But to ensure Git still recognizes the directory as a Git directory, we create stubs. There are two locations where we create stubs: - In 'refs/reftable-backend.c' when creating the reftable backend. - In 'clone.c' before spawning transport helpers. In a following commit, we'll add another instance. So instead of repeating the code, let's extract out this code to `refs_create_refdir_stubs()` and use it. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The incredibly useful
git-tbdifftool to compare patch series (say, to see what changed between two iterations sent to the Git mailing list) is slightly less useful for this developer due to the fact that it requires thehungarianandnumpyPython packages which are for some reason really hard to build in MSYS2. So hard that I even had to give up, because it was simply easier to re-implement the whole shebang as a builtin command.The project at https://github.com/trast/tbdiff seems to be dormant, anyway. Funny (and true) story: I looked at the open Pull Requests to see how active that project is, only to find to my surprise that I had submitted one in August 2015, and that it was still unanswered let alone merged.
While at it, I forward-ported AEvar's patch to force
--decorate=nobecausegit -p tbdiffwould fail otherwise.Side note: I work on implementing range-diff not only to make life easier for reviewers who have to suffer through v2, v3, ... of my patch series, but also to verify my changes before submitting a new iteration. And also, maybe even more importantly, I plan to use it to verify my merging-rebases of Git for
Windows (for which I previously used to redirect the pre-rebase/post-rebase diffs vs upstream and then compare them using
git diff --no-index). And of course any interested person can see what changes were necessary e.g. in the merging-rebase of Git for Windows onto v2.17.0 by running a command like:base=^{/Start.the.merging-rebase} tag=v2.17.0.windows.1 pre=$tag$base^2 git range-diff $pre$base..$pre $tag$base..$tagThe command uses what it calls the "dual color mode" (can be disabled via
--no-dual-color) which helps identifying what actually changed: it prefixes lines with a-(and red background) that correspond to the first commit range, and with a+(and green background) that correspond to the second range. The rest of the lines will be colored according to the original diffs.Changes since v4:
linear-assignment.hto reflect the new name (instead of the old name, which washungarian.h).range-diff.hto hide the history of the thrice-renamed command.git range-diffnow also shows the usage.diff_opt_parse()loop between twoparse_options(), to make sure that we caught all options, and that the--separator is handled.lookup_commit_reference()call to the newestmaster(it now takes athe_repositoryparameter).Changes since v3:
range-diffnow, notbranch-diff, and--dual-coloris the default).--dual-colorthe default.linear-assignment.cwas adjusted to use theSWAP()macro...in the 2-parameter invocation, we now exit with the error message.diff_opt_parse()function is allowed to return a value larger than 1, indicating that more than just one command-line parameter was parsed. We now advance by the indicated value instead of always advancing exactly 1 (which is still correct much of the time).if...else if...else if...elsewas simplified (from a logical point of view) by reordering it.staticvariabledasheswas turned into a local variable of the caller.git branch --diff, which has been fixed.--no-dual-coloroption.range-diffcommand, even if it is populated for real only at a later patch (i.e. at the same time as before).range-diffto compare an older iteration of a patch series to a newer one: the changes from the previous iteration that were replaced by new ones "fade", while the changes that replace them are "shiny new".Changes since v2:
iandjparameters to output_pair_header().--no-patchesoption: we inherit diff_options' support for-salready (and much more)._INVenum values, and introduced a beautiful GIT_COLOR_REVERSE instead. This way, whatever the user configured as color.diff.new (or .old) will be used in reverse in the dual color mode.--diffoption ofgit branch. Adjusted pretty much all commit messages to account for this. This change should no longer be visible: see below.four_spaces.branch --diff, which had been renamed frombranch-diff(which was picked to avoid re-usingtbdiff) torange-diff.hungarian.cand its header tolinear-assignment.c--dual-colorthe default, and changed it to still auto-detect whether color should be used rather than forcing it