diff --git a/Documentation/Makefile b/Documentation/Makefile index d079d7c73aca1f..841e4f70560999 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -69,6 +69,8 @@ API_DOCS = $(patsubst %.txt,%,$(filter-out technical/api-index-skel.txt technica SP_ARTICLES += $(API_DOCS) TECH_DOCS += SubmittingPatches +TECH_DOCS += technical/commit-graph +TECH_DOCS += technical/commit-graph-format TECH_DOCS += technical/hash-function-transition TECH_DOCS += technical/http-protocol TECH_DOCS += technical/index-format diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt index c664acbd765d06..b5afc3a82bcf37 100644 --- a/Documentation/technical/commit-graph.txt +++ b/Documentation/technical/commit-graph.txt @@ -40,32 +40,32 @@ Values 1-4 satisfy the requirements of parse_commit_gently(). Define the "generation number" of a commit recursively as follows: - * A commit with no parents (a root commit) has generation number one. +* A commit with no parents (a root commit) has generation number one. - * A commit with at least one parent has generation number one more than - the largest generation number among its parents. +* A commit with at least one parent has generation number one more than + the largest generation number among its parents. Equivalently, the generation number of a commit A is one more than the length of a longest path from A to a root commit. The recursive definition is easier to use for computation and observing the following property: - If A and B are commits with generation numbers N and M, respectively, - and N <= M, then A cannot reach B. That is, we know without searching - that B is not an ancestor of A because it is further from a root commit - than A. +If A and B are commits with generation numbers N and M, respectively, +and N <= M, then A cannot reach B. That is, we know without searching +that B is not an ancestor of A because it is further from a root commit +than A. - Conversely, when checking if A is an ancestor of B, then we only need - to walk commits until all commits on the walk boundary have generation - number at most N. If we walk commits using a priority queue seeded by - generation numbers, then we always expand the boundary commit with highest - generation number and can easily detect the stopping condition. +Conversely, when checking if A is an ancestor of B, then we only need +to walk commits until all commits on the walk boundary have generation +number at most N. If we walk commits using a priority queue seeded by +generation numbers, then we always expand the boundary commit with highest +generation number and can easily detect the stopping condition. This property can be used to significantly reduce the time it takes to walk commits and determine topological relationships. Without generation numbers, the general heuristic is the following: - If A and B are commits with commit time X and Y, respectively, and - X < Y, then A _probably_ cannot reach B. +If A and B are commits with commit time X and Y, respectively, and +X < Y, then A _probably_ cannot reach B. This heuristic is currently used whenever the computation is allowed to violate topological relationships due to clock skew (such as "git log" @@ -85,8 +85,11 @@ have generation number represented by the macro GENERATION_NUMBER_ZERO = 0. Since the commit-graph file is closed under reachability, we can guarantee the following weaker condition on all commits: - If A and B are commits with generation numbers N amd M, respectively, - and N < M, then A cannot reach B. +[quote] +_____________________________________________________________________ +If A and B are commits with generation numbers N amd M, respectively, +and N < M, then A cannot reach B. +_____________________________________________________________________ Note how the strict inequality differs from the inequality when we have fully-computed generation numbers. Using strict inequality may result in @@ -121,11 +124,8 @@ Future Work - After computing and storing generation numbers, we must make graph walks aware of generation numbers to gain the performance benefits they enable. This will mostly be accomplished by swapping a commit-date-ordered - priority queue with one ordered by generation number. The following - operations are important candidates: - - - 'log --topo-order' - - 'tag --merged' + priority queue with one ordered by generation number. Commands that could + improve include 'git log --topo-order' and 'git tag --merged'. - A server could provide a commit graph file as part of the network protocol to avoid extra calculations by clients. This feature is only of benefit if @@ -148,13 +148,16 @@ Related Links More discussion about generation numbers and not storing them inside commit objects. A valuable quote: - "I think we should be moving more in the direction of keeping - repo-local caches for optimizations. Reachability bitmaps have been - a big performance win. I think we should be doing the same with our - properties of commits. Not just generation numbers, but making it - cheap to access the graph structure without zlib-inflating whole - commit objects (i.e., packv4 or something like the "metapacks" I - proposed a few years ago)." +[quote, Jeff "Peff" King] +____________________________________________________________________ +I think we should be moving more in the direction of keeping +repo-local caches for optimizations. Reachability bitmaps have been +a big performance win. I think we should be doing the same with our +properties of commits. Not just generation numbers, but making it +cheap to access the graph structure without zlib-inflating whole +commit objects (i.e., packv4 or something like the "metapacks" I +proposed a few years ago). +____________________________________________________________________ [4] https://public-inbox.org/git/20180108154822.54829-1-git@jeffhostetler.com/T/#u A patch to remove the ahead-behind calculation from 'status'.