Implement `git lfs clone` #988

Merged
merged 15 commits into from

3 participants

@sinbad

Adds a new command git lfs clone which allows for more efficient cloning of LFS repositories by suppressing filters during clone/checkout then doing git lfs pull afterwards. Automatically suppresses error messages generated by this process.

Addresses #931, thanks to @larsxschneider for the extensive analysis, this is 100% based on his findings.

@sinbad

Well, that's interesting, Travis build failed on Mac (which is what I'm using) but not Linux. I think it's because the version of git in use is 1.9.x ("Apple Git-50" - which is really old, the latest Apple-supplied git is 2.5.4).

What's the oldest version of Git that LFS officially supports? We should probably require that in Travis.

@sinbad

I've updated the Travis config so it grabs a newer version of git via brew on OS X; this is useful actually since it will now be running the worktree tests and a couple other things that wouldn't run on git 1.9.x. I saw an issue with the cred helper test not being able to be turned off, I've tried to deal with that although it seems to have gone from the latest build.

@technoweenie
GitHub member

What's the oldest version of Git that LFS officially supports? We should probably require that in Travis.

I think the oldest version is 1.8.x, because of pre-push hook support. Though technically, we can support lower versions then that. They just may want to ensure their host has a pre-receive hook that makes sure LFS files are uploaded before accepting commits with LFS pointers.

If only the clone command is failing, maybe just that command should enforce a newer git version?

@technoweenie technoweenie referenced this pull request
Open

Git LFS v1.2 #844

6 of 23 tasks complete
@technoweenie
GitHub member

:metal:

OLD AND BUSTED:

$ time git clone https://git-server/technoweenie/lfs-pull-bug
Cloning into 'lfs-pull-bug'...
remote: Counting objects: 513, done.
remote: Compressing objects: 100% (508/508), done.
remote: Total 513 (delta 0), reused 513 (delta 0), pack-reused 0
Receiving objects: 100% (513/513), 70.20 KiB | 0 bytes/s, done.
Checking connectivity... done.
Downloading a.bin (5 B)
Checking out files: 100% (506/506), done.
git clone https://git-server/technoweenie/lfs-pull-bug  39.03s user 23.58s system 22% cpu 4:43.24 total

NEW HOTNESS

$ time git lfs clone https://git-server/technoweenie/lfs-pull-bug
Cloning into 'lfs-pull-bug'...
Git LFS: (501 of 501 files) 0 B / 1.85 KB
git lfs clone https://git-server/technoweenie/lfs-pull-bug  6.63s user 15.09s system 93% cpu 23.223 total

Looks like this is leaving out some of the output. Also, this doesn't handle cases where git clone fails:

$ git lfs clone
You must specify a repository to clone.
usage: git clone [<options>] [--] <repo> [<dir>]
    -v, --verbose         be more verbose
    -q, --quiet           be more quiet
    --progress            force progress reporting
    -n, --no-checkout     don't create a checkout
    --bare                create a bare repository
    --mirror              create a mirror repository (implies bare)
    -l, --local           to clone from a local repository
    --no-hardlinks        don't use local hardlinks, always copy
    -s, --shared          setup as shared repository
    --recursive           initialize submodules in the clone
    --recurse-submodules  initialize submodules in the clone
    --template <template-directory>
                          directory from which templates will be used
    --reference <repo>    reference repository
    --dissociate          use --reference only while cloning
    -o, --origin <name>   use <name> instead of 'origin' to track upstream
    -b, --branch <branch>
                          checkout <branch> instead of the remote's HEAD
    -u, --upload-pack <path>
                          path to git-upload-pack on the remote
    --depth <depth>       create a shallow clone of that depth
    --single-branch       clone only one branch, HEAD or --branch
    --separate-git-dir <gitdir>
                          separate git dir from working tree
    -c, --config <key=value>
                          set config inside the new repository
Error(s) during clone
Unable to log panic to : mkdir : no such file or directory

git-lfs/1.1.1 (GitHub; darwin amd64; go 1.5.3; git 7e8fa62)
git version 2.5.4 (Apple Git-61)

$ git-lfs clone
Error(s) during clone

git clone failed: exit status 129
goroutine 1 [running]:
github.com/github/git-lfs/lfs.Stack(0x0, 0x0, 0x0)
    /Users/rick/go/src/github.com/github/git-lfs/lfs/errors.go:557 +0x80
github.com/github/git-lfs/commands.logPanicToWriter(0xd641c0, 0xc820082010, 0xd60000, 0xc820073cc0)
    /Users/rick/go/src/github.com/github/git-lfs/commands/commands.go:184 +0xf7f
github.com/github/git-lfs/commands.logPanic(0xd60000, 0xc820073cc0, 0x0, 0x0)
    /Users/rick/go/src/github.com/github/git-lfs/commands/commands.go:148 +0x421
github.com/github/git-lfs/commands.handlePanic(0xd60000, 0xc820073cc0, 0x0, 0x0)
    /Users/rick/go/src/github.com/github/git-lfs/commands/commands.go:123 +0x4e
github.com/github/git-lfs/commands.LoggedError(0xd60000, 0xc820073cc0, 0x54b6c0, 0x15, 0x0, 0x0, 0x0)
    /Users/rick/go/src/github.com/github/git-lfs/commands/commands.go:73 +0x82
github.com/github/git-lfs/commands.Panic(0xd60000, 0xc820073cc0, 0x54b6c0, 0x15, 0x0, 0x0, 0x0)
    /Users/rick/go/src/github.com/github/git-lfs/commands/commands.go:83 +0x5d
github.com/github/git-lfs/commands.cloneCommand(0x75f000, 0x78a9a0, 0x0, 0x0)
    /Users/rick/go/src/github.com/github/git-lfs/commands/command_clone.go:26 +0xa7
github.com/github/git-lfs/vendor/_nuts/github.com/spf13/cobra.(*Command).execute(0x75f000, 0x78a9a0, 0x0, 0x0, 0x0, 0x0)
    /Users/rick/go/src/github.com/github/git-lfs/vendor/_nuts/github.com/spf13/cobra/command.go:477 +0x403
github.com/github/git-lfs/vendor/_nuts/github.com/spf13/cobra.(*Command).Execute(0x762260, 0x0, 0x0)
    /Users/rick/go/src/github.com/github/git-lfs/vendor/_nuts/github.com/spf13/cobra/command.go:551 +0x46a
github.com/github/git-lfs/commands.Run()
    /Users/rick/go/src/github.com/github/git-lfs/commands/commands.go:88 +0x23
main.main()
    /Users/rick/go/src/github.com/github/git-lfs/git-lfs.go:34 +0x12e

ENV:
LocalWorkingDir=
LocalGitDir=
LocalGitStorageDir=
LocalMediaDir=
TempDir=/var/folders/r6/qpmbzj194fj2ljwylsdc07hr0000gn/T/git-lfs
ConcurrentTransfers=3
BatchTransfer=true
GIT_LFS_TEST_DIR=/Users/rick/p/git-lfs-temp
GIT_LFS_TEST_MAXPROCS=4

Added this to Git LFS v1.2:

@larsxschneider

@sinbad Thanks for the credit :smile:

I recently updated Git on the Linux build, too:
#980

@technoweenie
Maybe we should add a Travis Matrix build. That could run the tests for different Git versions. See here for an example that compiles the source with different compiler versions:
https://github.com/genbattle/dkm/blob/29067b6e257016e3fbf9afe2773bada2fc333212/.travis.yml#L6-L41

@larsxschneider larsxschneider commented on an outdated diff
@@ -10,6 +10,8 @@ script: script/cibuild
notifications:
email: false
before_install:
+ - if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then brew update ; fi
+ - if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then brew install git ; fi

if you want to use multi line you can do it like this:
https://github.com/git/git/blob/master/.travis.yml#L37-L69

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@technoweenie
GitHub member

Maybe we should add a Travis Matrix build.

Yes, I was going to suggest that in your travis PR actually.

@larsxschneider larsxschneider and 1 other commented on an outdated diff
+ return fmt.Errorf("Failed to start git clone: %v", err)
+ }
+
+ // Filter stderr to exclude messages caused by disabling the filters
+ // As of git 2.7 it still tries to call the blank filter but required=false
+ scanner := bufio.NewScanner(stderr)
+ for scanner.Scan() {
+ s := scanner.Text()
+ // Swallow all the known messages from intentionally breaking filter
+ if strings.Contains(s, "error: external filter") ||
+ strings.Contains(s, "error: cannot fork") ||
+ // Linux / Mac messages
+ strings.Contains(s, "error: cannot run : No such file or directory") ||
+ strings.Contains(s, "warning: Clone succeeded, but checkout failed") ||
+ strings.Contains(s, "You can inspect what was checked out with 'git status'") ||
+ strings.Contains(s, "retry the checkout") ||

I wonder if we should filter for multi line message. Because individually these error messages could be legit in a different context.

E.g. here I tried to match on the clearly wrong error: external filter failed and then delete the lines around that:
https://gist.github.com/larsxschneider/85652462dcb442cc9344#file-git-lfsclone-sh-L7

@sinbad
sinbad added a note

Hmm, maybe, I wasn't keen on assuming the number of lines surrounding an error. Maybe rather than trying to be super-precise about parsing sequences of error text (which might change after all), perhaps it's best just to suppress all the known errors like this in the output but to write them all to the git-lfs logs. If git clone fails it'll stop with a non-zero anyway and we could just suggest running 'git lfs logs last' to get the full story. This is the same pattern we use elsewhere.

git-lfs logs sounds good! Plus it won't be a problem for future git versions anyways... the patch to suppress the errors just landed in Git master :tada: git/git@1a8630d

@sinbad
sinbad added a note

the patch to suppress the errors just landed in Git master :tada: git/git@1a8630d

W00t, congrats! :metal:

@sinbad
sinbad added a note

So in the end I sent the filtered stderr output to the trace so it's visible when you use GIT_TRACE=1 and it appears in context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@larsxschneider
@sinbad

If only the clone command is failing, maybe just that command should enforce a newer git version?

Yeah, I was hoping our min version was higher :) To find exactly which git version started tolerating this approach, either I'm going to have to binary search git versions until I find which one it breaks at, or I need to find the error in the git source & see when it changed. Kinda tedious so I didn't want to do it unnecessarily, but I'll sort that out.

Looks like this is leaving out some of the output.

I think this might be because the stdin isn't attached & git doesn't think it's interactive so dials back the output. I'll try connecting stdin to see if that works.

Also, this doesn't handle cases where git clone fails:

I expect nothing but perfection!!11 ;) Oops, will fix.

sinbad added some commits
@sinbad sinbad Implemented 'git lfs clone' 894794b
@sinbad sinbad Remove copy/paste error a1fd022
@sinbad sinbad Added tests for 'git lfs clone' b37fe0f
@sinbad sinbad Get a more recent version of git 74b3f79
@sinbad sinbad Attempting to fix Travis with newer git, cred helper issue
0a6248b
@sinbad sinbad git lfs clone requires git version 2.2.0
Any lower than this and git errors when setting filter to blank value
657949e
@sinbad sinbad Use Exit instead of Panic in lfs clone e7d27dd
@sinbad sinbad Merge master to resolve conflicts in Travis config
a19b63b
@sinbad sinbad Document issues with output in git lfs clone 65b0652
@sinbad sinbad Send all git clone stderr output to trace so it's visible if needed
87cdc58
@sinbad sinbad Only send filtered stderr output to trace to avoid duplication
48c5cd1
@sinbad

Looks like this is leaving out some of the output.

I think this might be because the stdin isn't attached & git doesn't think it's interactive so dials back the output. I'll try connecting stdin to see if that works.

Turns out I was almost right, it's that the progress output is suppressed by Git when it detects a non-TTY, but unfortunately just attaching stdin isn't enough. To get this output I think I'd have to call clone via a pseudo-TTY implementation like https://github.com/kr/pty - I'll play with it a bit but it may be overkill. Other than that detail I think this is ready.

[edit]For future reference I did try reading the raw stderr (definitely where the progress is sent per git source) instead of using buffered I/O it in case that helped, but it didn't. The only thing that worked was assigning os.Stderr (a real TTY), or changing the git source to force the progress output to be sent to stderr in all cases regardless.

sinbad added some commits
@sinbad sinbad Skip clone test when git version < 2.2
df70bf5
@sinbad sinbad Use pseudo-tty to ensure we get full git clone output
a61be12
@sinbad

I found we already had a dependency on kr/pty via kr/pretty so I've used it to provide a pseudo-tty to git clone, output is now complete although still slightly delayed compared to direct on terminal. Better than before though.

@sinbad sinbad Clean up comments
45c4ec3
@larsxschneider larsxschneider and 1 other commented on an outdated diff
@@ -622,6 +626,94 @@ func IsVersionAtLeast(actualVersion, desiredVersion string) bool {
return actual >= atleast
}
+// CloneWithoutFilters clones a git repo but without the smudge filter enabled
+// so that files in the working copy will be pointers and not real LFS data
+func CloneWithoutFilters(args []string) error {
+
+ // We can only support this on git version 2.2.0+
+ // Before that, setting filters to blank would break clone
+ if !Config.IsGitVersionAtLeast("2.2.0") {
+ return errors.New("git lfs clone requires git version 2.2.0+")

Couldn't you set the smudge filter to cat in this case?

@sinbad
sinbad added a note

Good point! Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@sinbad sinbad Instead of failing when git version <2.2, use slower cat fallback
d0e9536
@sinbad

Any chance of a merge on this one now @technoweenie? :)

@technoweenie
GitHub member

I was planning on a quick 1.1.2 release for some bug fixes. I think this would be more appropriate in 1.2. Maybe we can just cut 1.2's scope and ship that instead though :)

@sinbad

Maybe create a separate maintenance branch for 1.1.x? Then we could start pulling future 1.2 stuff together in master.

@technoweenie
GitHub member

Ah yeah, we even have a backport script for that :)

@technoweenie technoweenie merged commit e17f239 into github:master

2 checks passed

Details GitHub CLA @sinbad has accepted the GitHub Contributor License Agreement.
Details continuous-integration/travis-ci/pr The Travis CI build passed
@sinbad sinbad deleted the sinbad:git-lfs-clone branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment