Fix Cc: footer handling#163
Fix Cc: footer handling#163dscho merged 8 commits intogitgitgadget:mastergitgitgadget/gitgitgadget:masterfrom SyntevoAlex:#0227_cc_parsingSyntevoAlex/gitgitgadget:#0227_cc_parsingCopy head branch name to clipboard
Conversation
866ad62 to
a4e4cc9
Compare
tests/patch-series.test.ts
Outdated
| expect(log).toHaveBeenCalledTimes(1); // verify no more errors | ||
| }); | ||
|
|
||
| test("single 'Cc:' footer is parsed correctly", () => { |
There was a problem hiding this comment.
yet still more tests is better.
No, that is not true. That is the mindset that led to a regression test suite I encountered in one of my previous jobs that took a full two weeks to run. Obviously, such a regression test suite is so useless as to put all the time that was put into it to waste: nobody will spend their developer time waiting for such a test suite. Everybody will skip it.
In this instance, despite the explanation given by the commit message, I think that the balance between cost and benefit is slightly in favor of taking the additional tests. It is too bad that they disagree with the existing coding style, making the code more inconsistent than it was before, unnecessarily so (the new code refuses to use multi-line string constants, for example, and frequently ends lines with open parentheses, and it is totally not DRY: the same sequence is played out three separate times, with the only change being consisting of different descriptions and consequently of different expected values).
The balance would be struck much better, though, if a single test case was added: the last of the three. It basically includes coverage for the code paths that the first two exercise, so there is not actually any benefit in keeping the first two tests.
There was a problem hiding this comment.
I will delete the first two tests and try to follow the style better. Otherwise, if you like, I can remove the entire commit.
that took a full two weeks to run
Both extremes are usually suboptimal: very few tests or very many tests.
they disagree with the existing coding style
Sorry, not quite used to switching between different styles throughout the day.
refuses to use multi-line string constants
Curiously, one of the fixed bugs is related to line endings, and it's hard to say if it's CRLF or LF in multiline literals.
It basically includes coverage for the code paths that the first two exercise
I heard that good unit tests test one thing per test. I generally agree with that, but it's your project and your rules.
| "upstream/master:.github/PULL_REQUEST_TEMPLATE.md"], | ||
| { workDir: this.workDir }); | ||
| // github uses \r\n so make sure it is set | ||
| prTemplate = prTemplate.replace(/\r?\n/g, "\r\n"); |
There was a problem hiding this comment.
The commit message makes it sound as if the code did not expect CR/LF at all before. The truth is, though, that this part of the code expected CR/LF, but we need to move this transformation to GitHubGlue.getPRInfo().
Also, technically we now need to make sure that prTemplate does uses LF-only line endings, even if the code was checked out on Windows (our .gitattributes do not force *.md eol=lf, after all).
Keeping this transformation, but transforming to LF-only instead of CR/LF line endings, would make the commit also a lot more obvious.
There was a problem hiding this comment.
I will change the code to convert prTemplate to LF.
Wasn't sure if it's a good idea to keep code that does nothing at the moment, but could do something later. I guessed wrong.
There was a problem hiding this comment.
Wasn't sure if it's a good idea to keep code that does nothing at the moment, but could do something later.
As I pointed out, it already does something at the moment.
There was a problem hiding this comment.
Yes, I overlooked this case.
lib/gitgitgadget.ts
Outdated
| @@ -274,12 +274,10 @@ export class GitGitGadget { | ||
| // Remove template from description (if template exists) |
There was a problem hiding this comment.
At the same time,
PatchSeries.parsePullRequestDescriptionexpects LF text. These lines do not match regexp, because.is not allowed to match\r.
This is a good finding, but as there is no regular expression visible in the commit that uses ., this statement is actually a lot more puzzling than it is helpful.
A better commit message would look somewhat like this:
However,
PatchSeries.parsePullRequestDescription()takes the Pull Request description as obtained viaGitHub.getPRInfo(), first tries to match a footer, if found, splits it by LF characters, and then tries to match the footer values via(.*), and.does not match Carriage Returns:[...] const match = description.match(/^([^]+)\n\n([^]+)$/); if (match) { coverLetter = match[1]; cc = []; const footer: string[] = []; for (const line of match[2].trimRight().split("\n")) { const match2 = line.match(/^([-A-Za-z]+:) (.*)\n?$/); [...]As a consequence, only one footer was ever matched, the last one, and only if it did not end in a newline.
This way of presenting it actually points to the real bug here: after we split by newlines, there is no Line Feed (\n) to be matched, not even an optional one. However, what we should have matched here would be an optional Carriage Return (\r?).
Therefore, I think that a better version of this commit would leave lib/gitgitgadget.ts and lib/github-glue.ts alone and merely replace the \n? by \r? in this line:
const match2 = line.match(/^([-A-Za-z]+:) (.*)\n?$/);This would be much preferable a change because nobody, not you, not even me, knows what line ending convention the other users of GitHubGlue.getPRInfo() expect. Therefore, the current version of this commit might very well break other things, in addition to fixing one thing.
There was a problem hiding this comment.
I will change the patch to only touch \r vs \n.
after we split by newlines, there is no Line Feed (\n) to be matched
I was puzzled by this myself but decided to restrain myself to apparent problems only.
not you, not even me, knows what line ending convention the other users of GitHubGlue.getPRInfo() expect.
I did a search of \r everywhere and concluded that at best it's ignored. One other thing that I frankly didn't do is to search for references to IPullRequestInfo.body. I imagine that there's just one use for it.
and merely replace the
\n?by\r?
I considered that. However, there was other code that uses \n to append/concat lines, producing a mixture of /n and /r/n. With other patch in mind that separates PR descriptions from other things, this is a smaller issue, yet still the footer parsing code will also (unexpectedly) replace /r/n with /n. I didn't like this idea and thought that it would be better to just get rid of \r\n early. What do you think?
There was a problem hiding this comment.
and merely replace the
\n?by\r?I considered that. However, there was other code that uses
\nto append/concat lines, producing a mixture of/nand/r/n. With other patch in mind that separates PR descriptions from other things, this is a smaller issue, yet still the footer parsing code will also (unexpectedly) replace/r/nwith/n. I didn't like this idea and thought that it would be better to just get rid of\r\nearly. What do you think?
I think that if we really want to do a wholesale CR/LF -> LF, we have to be a lot more careful, and a PR that claims to fix the Cc: footers is probably the wrong venue for such an endeavor.
There was a problem hiding this comment.
Shall I also concat lines with \r\n to preserve old logic?
There was a problem hiding this comment.
Yes, I also thought that it would be reasonable to allow user to put Cc: anywhere and not require an empty line before it (that's kind of unexpected requirement). And yes, due to a bug, it always attempted to work this way, but always failed due to other bug where \r didn't match regex. So indeed maybe there's no reason to change that now. I only chose to fix (change?) code to parse "footer" because it looked like original code's intention.
So the change could be two bits?
1. GitHubGlue removes the `\r` 2. `parsePullRequestDescription()` uses less greedy `/^([^]+)\n\n?([^]+)$/` to process all body lines. As noted above I think all body lines should be processed (and they currently are).
If we choose to parse entire PR body, then with the other commit in mind, first regex /^([^]+)\n\n([^]+)$/ can be dropped completely, because with the other commit parsePullRequestDescription() will only receive pr body and nothing else.
There was a problem hiding this comment.
It's a bit hard to judge what the code should do when it fails to work in so many ways (currently the only working case is when there's a single Cc: line in the very end without trailing newline).
If it was my choice, I would allow Cc: to be at any position and relax the requirement to have an empty line before the footer. That would be more reliable for users who didn't know (or forgot) about the "empty line" rule. The chance of false positive parsing is pretty low in my understanding.
There was a problem hiding this comment.
If it was my choice, I would allow
Cc:to be at any position and relax the requirement to have an empty line before the footer. That would be more reliable for users who didn't know (or forgot) about the "empty line" rule. The chance of false positive parsing is pretty low in my understanding.
No, it would not be reliable at all, and the chance of a false positive is very real. Take for example git/git@831a488: if the line
"Cc: Stable kernel <stable@vger.kernel.org> #v3.4 v3.5 v3.6"
would not have double-quotes around it, and was part of the PR description, it would be a false positive and the contributor would be rightfully annoyed. Or if the commit message used this formatting:
In some cases it is useful to add additional
information after the email address on the
Cc: footer in a commit log, for instance:
"Cc: Stable kernel <stable@vger.kernel.org> #v3.4 v3.5 v3.6"
If you truly want to allow Cc:in anywhere in the PR description, you will have to make it a ton more robust, e.g. by requiring some sort of email address validation (which is hard although we could go for "good enough", and thank goodness we don't have to do this in Perl).
So if you want this, make sure to not only look for Cc: at the beginning of the line (and make sure to do this case-insensitively), but do verify that it is followed by a valid email address (or by a valid name plus email address in <...>, personally I would allow names that are all alpha-numerical characters plus - plus . plus double-quotes plus single-quotes plus single space, with at least one alphanumerical character in it), possibly followed by an arbitrary number of comma-separated recipients.
There was a problem hiding this comment.
No, it would not be reliable at all, and the chance of a false positive is very real.
Thanks, that sounds like a valid concern.
Alternatively, we could use some more unambiguous footers like GitGitGadget-Cc: but that would break existing habits, so probably not worth it either.
I will then keep with the idea to adjust the code so that only footer is parsed (which didn't work as expected before my patches).
There was a problem hiding this comment.
No, it would not be reliable at all, and the chance of a false positive is very real.
This chance exists now due to \n\n splitting on title/body and not footer within body (due to \r\n\r\n. I do not object to the concern - I did not realize there was a higher risk of it being in commit messages.
lib/patch-series.ts
Outdated
| pullRequestURL: string, | ||
| pullRequestTitle: string, | ||
| pullRequestSubmitter: string, | ||
| pullRequestDescription: string, |
There was a problem hiding this comment.
This is no longer the entire description, therefore it needs to be renamed to pullRequestDescriptionBody.
There was a problem hiding this comment.
If you like.
In my understanding, PR consists of title and description.
There was a problem hiding this comment.
In my understanding, PR consists of title and description.
That obviously disagrees with the usage of the term "PR description" in current master.
There was a problem hiding this comment.
History does show const description = ${pr.title}\n\n${pr.body}; in ci-helper.ts. Moving from strings to objects is a good thing.
One of the commits indicates a \n\n in the pr.body could cause a problem. The explanation was not clear. Since the title was always pre-pended with a pair of lf, was this a regex greedy problem?
There was a problem hiding this comment.
Yes, regex is greedy, which is a good thing according to code's intention: it should split on the very last \n\n to separate footer from the rest
lib/patch-series.ts
Outdated
| coverLetter, | ||
| } = PatchSeries.parsePullRequestDescription(pullRequestDescription); | ||
| const parsedPR = | ||
| PatchSeries.parsePullRequestDescription(pullRequestDescription); |
There was a problem hiding this comment.
This is not an improvement, and the commit message rightfully does not talk about it.
Let's keep the const { basedOn, cc, coverLetter} form.
There was a problem hiding this comment.
I will replace const with let, reassign to old variables and drop the new variables.
This came as a result of my inexperience with TypeScript and trying to preserve const on old variable.
lib/patch-series.ts
Outdated
| coverLetter = md2text(coverLetter, wrapCoverLetterAtColumn); | ||
|
|
||
| // if known, add submitter to email chain | ||
| const cc = parsedPR.cc || []; |
There was a problem hiding this comment.
Again, this refactoring makes the diff unnecessarily hard to read and to validate. Let's just not.
lib/gitgitgadget.ts
Outdated
| const description = `${pr.title}\n\n${prBody}${ccSubmitter}`; | ||
| const ccSubmitter = userInfo.email | ||
| ? `${userInfo.name} <${userInfo.email}>` | ||
| : ""; |
There was a problem hiding this comment.
If we're already going that route, it would be better to set ccSubmitter to false:
const ccSubmitter = userInfo.email && `${userInfo.name} <${userInfo.email}>`;There was a problem hiding this comment.
Thanks for the advice. That was my first TypeScript exercise :)
There was a problem hiding this comment.
Don't take TypeScript advice from me. I am definitely not an expert.
lib/patch-series.ts
Outdated
| // if known, add submitter to email chain | ||
| const cc = parsedPR.cc || []; | ||
| if (pullRequestSubmitter.length) { | ||
| cc.push(pullRequestSubmitter); |
There was a problem hiding this comment.
Now, this looks like a benign change, right? But it is not. It changes behavior, and in an undesired way.
A much, much better idea would be to just pass pullRequestSubmitter || [] as third parameter to parsePullRequestDescription().
That requires a bug fix, of course, if you look carefully: this line currently overrides that parameter and should read cc = cc || []:
cc = [];(That bug fix needs to be in its own commit, I would like to point out.)
There was a problem hiding this comment.
Hmm, my lack of Typescript understanding doesn't allow me to see the bug.
Trying to read code with general sense derived from C++, I see the following:
parsePullRequestDescriptionhas a local variable that it later ruturns. It can be eitherstring[]orundefined.getFromNotescallsparsePullRequestDescriptionand receives result intocc.- At this time, I want to add one more item, so I have to ensure that it is no longer
undefined, so I allocate a new[]if needed.
I can't understand where I made a mistake. Also, what do you mean by "third parameter to parsePullRequestDescription()" ?
There was a problem hiding this comment.
It was too late at night, I misread parsePullRequestDescription() to take cc as an optional parameter. But that optional parameter (that does not exist yet) would Just Make Sense in the context of this PR.
tests/gitgitgadget.test.ts
Outdated
| `; | ||
| const match2 = description.match(/^([^]+)\n\n([^]+)$/); | ||
| expect(match2).toBeTruthy(); | ||
| const description = `This Pull Request contains some really important changes that I would love to${ |
There was a problem hiding this comment.
This is no longer the description but the body, and it should be accompanied by the title.
Also, this change removes the very Cc: footer that the first commit in this PR claims was previously untested.
It would be better to keep that footer and add a separate submitter to the getFromNotes() call below.
There was a problem hiding this comment.
I will add a new variable title and rename description to body. I will add another Cc line.
Also, this change removes the very Cc: footer that the first commit in this PR claims was previously untested.
Since GitGitGadget always adds author to Cc: I decided that here the line is the auto-appended author. So I merely preserved the old test scenario. But yes, this can be seen in a different way.
tests/gitgitgadget.test.ts
Outdated
| await PatchSeries.getFromNotes(notes, pullRequestURL, | ||
| "My first Pull Request!", | ||
| "Some Body <somebody@example.com>", | ||
| description, |
There was a problem hiding this comment.
Only one of these two changed getFromNotes() calls should pass a valid submitter, the other one should pass false or undefined instead, to increase code coverage.
There was a problem hiding this comment.
I will add another patch with that.
Current patch doesn't touch the existing tests, only applying a fix elsewhere.
|
In all these examples, the missing Cc is in the text of the message (part of the footer). The last one has a cr-lf after the Cc. |
This comment has been minimized.
This comment has been minimized.
663bf7b to
0d27e5d
Compare
|
I have submitted a new version. Please review. I tried to take all previous comments into account, hopefully I didn't miss anything. Some of previous comments are no longer applicable. |
|
I have some worries about |
| const series = | ||
| await PatchSeries.getFromNotes(this.notes, pr.pullRequestURL, | ||
| description, pr.baseLabel, | ||
| pr.title, pr.body, pr.baseLabel, |
There was a problem hiding this comment.
Should the pr and userInfo be passed to simplify the parm list? This is a lot of parameters.
There was a problem hiding this comment.
Thanks for your review! Unfortunately I need to switch to my git PR at the moment, and hopefully I will get back to this PR and polish it in a couple days.
There was a problem hiding this comment.
Sorry for not replying when you mentioned it for the first time.
Yes, this sounds reasonable, but even before my patch, this was already the case. I thought that @dscho had some reasons for it. Maybe this is some kind of abstraction problem, or he wanted to make testing easier? I don't know. So I decided to just follow the same path.
| "upstream/master:.github/PULL_REQUEST_TEMPLATE.md"], | ||
| { workDir }); | ||
| // Depending on core.autocrlf setting, template may contain \r\n | ||
| prTemplate = prTemplate.replace(/\r\n/g, "\n"); |
There was a problem hiding this comment.
Should probably still have the \r? on the regex.
There was a problem hiding this comment.
Not to my understanding.
Previous regex tried to replace \n to \r\n but keep \r\n as is without changing to \r\r\n. \r?\n served this purpose.
Now, I want to change \r\n to \n. There's no need to change \n to \n because that's a no-op.
lib/patch-series.ts
Outdated
|
|
||
| // if known, add submitter to email chain | ||
| if (senderEmail) { | ||
| cc.push(`${senderName} <${senderEmail}>`); |
There was a problem hiding this comment.
Is this the best place for this? Two extra parms are passed. Might be cleaner to pass the pre-populated cc array in to the method.
There was a problem hiding this comment.
Thanks for pointing out.
I figured that it can be simplified even more if I move cc.push block into getFromNotes().
That required me to shuffle code between commits a little bit, and now Fix footer parsing when PR body ended with newline commit comes earlier then before.
lib/patch-series.ts
Outdated
| cc.push(`${senderName} <${senderEmail}>`); | ||
| } | ||
|
|
||
| coverLetter = `${prTitle}\n\n${coverLetter}`; |
There was a problem hiding this comment.
Why not just pass the template to md2text instead of updating coverLetter.
There was a problem hiding this comment.
It's easier to read when one line of code does one action.
There was a problem hiding this comment.
Linter forced me to introduce two additional variables, so I actually felt tempted to "just throw everything in super long line".
| cc: string[], | ||
| }> { | ||
| // Replace \r\n with \n to simplify remaining parsing. | ||
| // Note that md2text() in the end will do the replacement anyway. |
There was a problem hiding this comment.
A very good point. Is the \r\n to \n needed when we are now more aware of the issue and can code for it.
There was a problem hiding this comment.
Yes, otherwise we'll have two bad choices:
- Write extra code to preserve
\r\n- see previous discussion: Fix Cc: footer handling #163 (comment) - Ignore that and mix
\r\nwith\ninparsePullRequestBody()- that's dirty on its own, but also relies on hidden knowledge that current outside caller drops\r\nlater anyway, which is even more dirty.
So replacing with \n is best.
| let prTemplate = | ||
| await git(["show", | ||
| "upstream/master:.github/PULL_REQUEST_TEMPLATE.md"], | ||
| { workDir }); |
There was a problem hiding this comment.
Not sure why this was moved here from GitGitGadget. I understand the isolation of the 'mods' to the prBody but the function is parse and not cleanup.
There was a problem hiding this comment.
My biggest reason was to isolate everything that relies upon CRLF/LF in one place, so that I can replace with \n and see the consequences in just one place.
I actually think that it is well placed: the function parses cover letter out of pr body, and removing trash from pr body is reasonable part of that parsing.
tests/patch-series.test.ts
Outdated
| const parsed = await PatchSeries.parsePullRequest(".", prTitle, | ||
| prBody, | ||
| senderName, | ||
| senderEmail); |
There was a problem hiding this comment.
Passing '.' is passing the gitgitgadget repo when a git repo is expected. Try/catch will handle but could be interesting if gitgitgadget/gitgitgadget adds a pull request template that says some description goes here. 🙂
There was a problem hiding this comment.
Thanks for pointing out!
I also noticed that something is wrong #163 (comment) but then had to switch to other task.
Fixed now.
| pr.baseCommit, pr.headLabel, | ||
| pr.headCommit, options, | ||
| userInfo.name); | ||
| userInfo.name, userInfo.email); |
There was a problem hiding this comment.
fyi, userInfo.email may be null. Yes the interface needs to be updated to reflect that or return an empty string (some other code does not like null). Another argument for passing interfaces. The new gitgitgadget.test.ts test is passing undefined for the email.
There was a problem hiding this comment.
When I tried to pass null in test, I got this compile error:
error TS2345: Argument of type 'null' is not assignable to parameter of type 'string'
This is again where I lack Typescript understanding.
I have tested and everything seems to work as intended for cases where email returned by GitHub is null.
If there is something I need to change, what is the simplest fix?
There was a problem hiding this comment.
Testing locally gives mixed results so I don't have a quick fix and nothing for you to change. @dscho Any preference for IGitHubUser.email to support null or always return an empty string if github returns a null?
e90fe90 to
7334d65
Compare
Skip the extra work of making sure that `cc` is defined by always returning defined array from `parsePullRequestDescription()`
In this case, appending `\nCc: ${userName} <${userEmail}>` created
`\n\n` at append position and caused regex `/^([^]+)\n\n([^]+)$/` in
`parsePullRequestDescription()` to make an incorrect split, as if only
`Cc: ${userName} <${userEmail}>` was the footer. This ignored all other
footers.
7334d65 to
56c8025
Compare
Together with the following patches, this should make it easier to see everything that is related to PR parsing in one place. Had to use 'wrappedLetter' to silence linter on: Identifier 'basedOn' is never reassigned; use 'const' instead of 'let'' Identifier 'cc' is never reassigned; use 'const' instead of 'let''
On top of collecting all related code in one place, it also prepares grounds for next patches.
56c8025 to
74f4ee0
Compare
|
Pushed the new version, hopefully addressing all concerns. It already took a lot longer then I hoped. I admit, my lack of any at all Typescript experienced didn't help. Still, at this point, I kindly ask to limit review to important things only. If there's something you'd like to change, I would be happy if you do that for me. |
|
Are commit messages important? Still reviewing (maybe get through it tomorrow).
typo and I don't understand the start of the last paragraph of the commit message. |
|
I don't mind improving commit messages, that doesn't take too much time. You mean that this is hard to understand?
Will that be better?
|
When asking for the PR description, GitHub produces it with CR/LF line endings. This causes two different problems for the current implementation of `parsePullRequestDescription()`. Fix both problems by replacing `\r\n` with `\n` early. Problem 1 --------- The second regex `/^([-A-Za-z]+:) (.*)\n?$/` did not match lines that end with `\r\n`, because `(.*)` is not allowed to match `\r`. `\n?` was a mistake where `\r?` was meant. As a result, only the last footer was processed, and only if it wasn't ended with a newline. Problem 2 --------- The first regex `/^([^]+)\n\n([^]+)$/` wants to split footer on `\n\n` while it should in fact split on `\r\n\r\n`. As a result, the split didn't work. Still, it sometimes produced expected results because it instead split on the title-body boundary, considering the entire body as footers. This "footer anywhere" behavior is not preserved, because it is against the code's intention, and it was considered undesired in discussion for this patch. For example, consider commit message for git @ 831a488 : In some cases it is useful to add additional information after the email address on the Cc: footer in a commit log, for instance: "Cc: Stable kernel <stable@vger.kernel.org> #v3.4 v3.5 v3.6" However, git-send-email refuses to pick up such an invalid address when the Email::Valid perl module is available, or just uses the whole line as the email address. In this commit message, there are two Cc: occurrences. Luckily, both of them are not at the start of the line: one is in the middle and the other is preceded by quote. Hence they are not mistaken for `Cc:` footers by GitGitGadget. However, it is easy to imagine that either of them could have been at the beginning of the line, causing undesired parsing results.
With other patches already in place, this could be considered as a refactoring. Previously, it would have prevented unexpected splitting on title-body boundary, causing real use case bugs (see previous patch).
It is all too easy to write `CC:` (i.e. write the second `c` in upper case) by mistake. This is not hypothetical, it happened, see e.g. git/git#676 (comment) Let's be lenient and allow for mixed case, upper case and lower case. In fact, just make the comparison case-insensitive. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
74f4ee0 to
d1c73a1
Compare
Heh, that does not match my experience. To write really good commit messages, I regularly spend much more time writing those than to implement the actual fix. In fact, when I started contributing to the Git project, I regularly spent something like an hour on each commit message. It only got better by practicing hard. I still often spend something like 15 minutes to write an excellent commit message, trying most of all to imagine that I would read this, six months from now, after having forgotten pretty much all the details.
That is indeed much better. Since you indicated that you would be happy for me to push this over the finish line, I did exactly that. Additionally, I included a fix for the problem that occurred recently where a contributor wrote |
|
Yes, thanks for helping to get that merged! Also, I'd like to once again thank you for this project. It really helps a lot when dealing with git PRs!
Yes, I already have a lot of practice with writing commit messages from my previous job, where I also convinced other people to write detailed commit messages. I was quite happy to see git to follow the same practice. However, lately I've been caught into some time pressure loop and started to do things in a rush (i.e. not proof reading), which I now noticed, so hopefully will be back to my previous quality. |
This PR fixes two different problems, each of them prevented
Cc:footers from working for me.Quick digging shows that I'm not alone here:
--pathspec-from-fileoption git#445 --- only one CC is handledneeds_hiding()git#427 --- CC is ignored