-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
CI Add workflow to check Changelog entry. #19155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cmarmo
The original issue seemed to suggest to only trigger an error if tests were modified. Is this the case here? I think that most PRs don't require a changelog entry, so we probably don't want the error to be the default in all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cmarmo, I agree we should keep the labeling thing as not all PRs modifying tests would need a changelog entry (but probably most will).
My CI expertise is weak: any chance we can make the CI check issue a "warning" rather than an error (or anything not as critical as an error)?
When this is merged, we should ping all core devs to inform them that this is in (I think we should make it a habit to ping people for such infra changes).
I haven't double checked whether the check is correct but this isn't critical so LGTM
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
I believe a warning could be issued but then the check board will show it as 'green'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this check. I thing we could reduce the number of false "negative" by maintaining a blacklist and whitelist of labels to enable or disable it:
Whitelist:
Bug
New Feature
Performance
Blacklist:
CI / Build
Documentation
Decision rule
- If the
No changelog
then skip the check. - Else if any of the whitelist labels are enabled then do the check.
- Else if any of the blacklist labels are enabled, skip the check.
- Else: do the check (this probably means that the PR labels should be updated).
What do you think about this suggestion?
then | ||
echo "Changelog has been updated." | ||
else | ||
echo "Changelog entry is missing. If no changelog entry is required for this PR, label the PR with no-changelog to bypass this check." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should be extra-explicit on case where we need a changelog entry:
- bug fixes
- new features (in particular with changes to the public API, e.g. with a new public class or function or a new parameter)
Changes that do not need a changelog entry:
- documentation fixes or improvements
- new or updated examples or benchmark
- fix or improvements in the test suite
- code style changes
- updated continuous integration configuration
- updated packaging configuration or tools (unless the previously released packages on pypi.org or conda-forge where invalid for some reason, in which case it's useful to mention the list of problematic packages in the changelog)
Note that it is ok to reference several PRs in the same changelog entry, for instance when a big task has been decomposed in subtasks or if a partial fix has been further improved in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* fix or improvements in the test suite
wrt the current implementation this case will be checked because test files are likely to be modified. Should I get rid of this check?
scikit-learn/.github/workflows/check_changelog.yml
Lines 22 to 23 in 62a5068
tests=$(git diff --name-only origin/master) | |
if [[ "$tests" =~ tests ]] |
Note that it is ok to reference several PRs in the same changelog entry, for instance when a big task has been decomposed in subtasks or if a partial fix has been further improved in a follow-up PR.
The current implementation check if the submitted PR is cited in the changelog: this is always true when a changelog is needed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the PR is only a about fixing a bad test or adding a new one, we shall not document it in a changelog entry because it's probably useless for our end-users. I mean the test fix is useful to us and our users but since the tests are not part of the public API, it's not that interesting to the readers of the changelog.
The current implementation check if the submitted PR is cited in the changelog: this is always true when a changelog is needed, right?
Yes, this is correct.
I like it , but I'm under the impression that labels are still not systematically used in the scikit-learn issue tracker, despite the current effort of automation. I would rather to keep the workflow simple for now: it could always be improved in the future. |
I'm personally more concerned about false positive than false negative. Quite frankly, I don't think there are tons of occurrences where we forgot to add a changelog entry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok let's keep it simple then.
I am fine with merging this as it is, unless @cmarmo would like to implement the version check thingy (https://github.com/scikit-learn/scikit-learn/pull/19155/files#r556562785).
Let me try to implement it : I think it is a useful test... at least this will force to move the milestones if they are obsoletes... and there are some cases in the issue tracker... :) |
I created the new label as Can you please update the PR accordingly (both for the skip condition and the user displayed to the users in case of failure of the check). |
then | ||
echo "Changelog has been updated." | ||
check_mlstn=$(cat ./doc/whats_new/v${MILESTONE_TITLE:0:4}.rst) | ||
if [[ "$check_mlstn" =~ :pr:\`$PR_NUMBER\` ]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what happens if there's no mileston label on the PR?
instead of checking all changelogs first and then the specific changelog, could we just check
- all changelogs if there's no label
- the expected changelog if there's a label?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to prefer the current solution for the following reason: I have seen PRs mile-stoned for the r+2 release while r+1 was under development.
Assuming that the changelog entry has been written in r+1
- current implementation: you will be notified that the entry has been written but there is a mis-versioning (edited: not really, because the changelog is not created yet)... but the check will be green anyway.
your suggestion: the check will fail. You should verify yourself if the entry is there.
I'm assuming that the furthest milestone has been assigned to notify low priority or to not put pressure ... I don't know... but as I've seen this quite often in the issue tracker I'm not comfortable in telling to people "don't do that" as long as an automatic check is affordable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at all... sorry, the furthest changelog is not present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the check will be green anyway
I'm a bit confused because as far as I understand the current check, it will fail if the entry isn't written in the corresponding changelog. Which is why it seems unnecessary to check all changelogs first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm affected by a confirmation bias here ... :(
Another quite common case is an old milestone in a new PR:
- current implementation: the entry is there, the check will tell you that the milestone is wrong
- your suggestion: there is no new entry in the old changelog.
Again, I know that those cases should not happen, but they happen...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC your point is that the current version will provide a more informative error message? Fair point. I still find the proposed structure below easier to follow (and more efficient, which doesn't really matter anyway). Maybe we can just amend the general error message saying
This PR is modifying tests and we thus expect a changelog entry. If no changelog entry is needed, label the PR with
no changelog needed
. If you see this error and there is already a changelog entry then make sure that a) the PR number is correct, b) the tagged milestone for the PR (if any) and the changelog name properly match.
WDYT?
Basically here's what would seem natural to me:
I could be wrong but I believe that this is what the PR is currently doing, although it does not follow the above structure |
This PR does the following:
As I said before (our comments crossed) I prefer keep this way because it allows me to check mismatch between changelog and milestones as they are common in the issue tracker. But to be honest I'm unable to test all the use cases and I would rather to see it working to verify its behavior. |
Sorry if I'm missing something but I think this description is strictly equivalent to my proposal in terms of behavior: #19155 (comment) The only difference I see is in terms of error message opportunity, for which I proposed a general solution in #19155 (comment) |
To clarify: I'm proposing to implement the new error message + #19155 (comment), not just the new error messages alone. The new error message proposal is so that the messages become as informative as they are in the current version If the proposed structure leads to a different behavior then please let me know, but otherwise I think it will be simpler, cleaner, and easier to maintain. |
I'm quite convinced the current structure is different from your suggestion: that's why I needed to split the message in two parts.
Again, this is maybe not an expected use case but it happened already (in particular with the 1.0 version). Maybe this will not happen in the future but I can't be sure of that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set -xe | ||
changed_files=$(git diff --name-only origin/master) | ||
# Changelog should be updated only if tests have been modified | ||
if [[ "$changed_files" =~ tests ]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still suggest to exit early here if no tests are changed: this saves one nested block, which helps readability (especially in bash ;) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen your previous comment and and replied there: I'm sorry but I can't understand what you are suggesting.
Should I add an else
with exit 0
? The end of this if
is the end of the script, I guess if the check is False there is nothing else to be executed...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed your reply up there. Basically I'm suggesting to do
if [[ ! "$changed_files" =~ tests ]] # added the "!" here
then
exit 0
fi # <- move very last fi here
# continue as before but everything indented left
I did another pass. Let's try this for real and we will see if we need to adjust the conditions or message based on contributors feedback in PRs. |
Merged! |
really just an innocuous commit to check scikit-learn#19155
I also note that with |
This is indeed what happens: if a PR is referenced in a previous changelog without being tagged the check errors. As milestones are not mandatory for now I can't check only the milestone changelog but I need to look for the PR everywhere. To fix the error the contributor should add or move the entry, or the team should change the milestone (this can be useful for minor releases), if any. |
I don't see this logic in the code. If a PR is referenced in a previous changelog, and the PR is not tagged with a milestone, then the check passes, regardless of which changelog the PR edits. I am suggesting, rather, that the PR enforces that the changelog entry is in the latest version's logs.
Perhaps with automation? Previously it's not been that helpful except for timely issues. |
Reference Issues/PRs
Fixes #10451
What does this implement/fix? Explain your changes.
Add a github action checking if the pull request is referenced in the changelog together with its author.
Any other comments?
Need the creation of a label "no-changelog" for pull requests that do not need to be listed.
This is the same approach used by astropy.
See its test implementation here.