Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

CI Add workflow to check Changelog entry. #19155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Feb 23, 2021

Conversation

cmarmo
Copy link
Contributor

@cmarmo cmarmo commented Jan 11, 2021

Reference Issues/PRs

Fixes #10451

What does this implement/fix? Explain your changes.

Add a github action checking if the pull request is referenced in the changelog together with its author.

Any other comments?

Need the creation of a label "no-changelog" for pull requests that do not need to be listed.
This is the same approach used by astropy.

See its test implementation here.

@cmarmo cmarmo changed the title [BLD] Add workflow to check Changelog entry. CI Add workflow to check Changelog entry. Jan 11, 2021
Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cmarmo

The original issue seemed to suggest to only trigger an error if tests were modified. Is this the case here? I think that most PRs don't require a changelog entry, so we probably don't want the error to be the default in all cases.

.github/workflows/check_changelog.yml Outdated Show resolved Hide resolved
Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cmarmo, I agree we should keep the labeling thing as not all PRs modifying tests would need a changelog entry (but probably most will).

My CI expertise is weak: any chance we can make the CI check issue a "warning" rather than an error (or anything not as critical as an error)?

When this is merged, we should ping all core devs to inform them that this is in (I think we should make it a habit to ping people for such infra changes).

I haven't double checked whether the check is correct but this isn't critical so LGTM

.github/workflows/check_changelog.yml Outdated Show resolved Hide resolved
.github/workflows/check_changelog.yml Outdated Show resolved Hide resolved
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
@cmarmo
Copy link
Contributor Author

cmarmo commented Jan 12, 2021

My CI expertise is weak: any chance we can make the CI check issue a "warning" rather than an error (or anything not as critical as an error)?

I believe a warning could be issued but then the check board will show it as 'green'.
I've seen that Astropy adds "Allowed failure" in the name of the check to explicit that its failure is not critical, we can pick this option here, if you think it could be useful.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this check. I thing we could reduce the number of false "negative" by maintaining a blacklist and whitelist of labels to enable or disable it:

Whitelist:

  • Bug
  • New Feature
  • Performance

Blacklist:

  • CI / Build
  • Documentation

Decision rule

  • If the No changelog then skip the check.
  • Else if any of the whitelist labels are enabled then do the check.
  • Else if any of the blacklist labels are enabled, skip the check.
  • Else: do the check (this probably means that the PR labels should be updated).

What do you think about this suggestion?

then
echo "Changelog has been updated."
else
echo "Changelog entry is missing. If no changelog entry is required for this PR, label the PR with no-changelog to bypass this check."
Copy link
Member

@ogrisel ogrisel Jan 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should be extra-explicit on case where we need a changelog entry:

  • bug fixes
  • new features (in particular with changes to the public API, e.g. with a new public class or function or a new parameter)

Changes that do not need a changelog entry:

  • documentation fixes or improvements
  • new or updated examples or benchmark
  • fix or improvements in the test suite
  • code style changes
  • updated continuous integration configuration
  • updated packaging configuration or tools (unless the previously released packages on pypi.org or conda-forge where invalid for some reason, in which case it's useful to mention the list of problematic packages in the changelog)

Note that it is ok to reference several PRs in the same changelog entry, for instance when a big task has been decomposed in subtasks or if a partial fix has been further improved in a follow-up PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* fix or improvements in the test suite

wrt the current implementation this case will be checked because test files are likely to be modified. Should I get rid of this check?

tests=$(git diff --name-only origin/master)
if [[ "$tests" =~ tests ]]

Note that it is ok to reference several PRs in the same changelog entry, for instance when a big task has been decomposed in subtasks or if a partial fix has been further improved in a follow-up PR.

The current implementation check if the submitted PR is cited in the changelog: this is always true when a changelog is needed, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the PR is only a about fixing a bad test or adding a new one, we shall not document it in a changelog entry because it's probably useless for our end-users. I mean the test fix is useful to us and our users but since the tests are not part of the public API, it's not that interesting to the readers of the changelog.

The current implementation check if the submitted PR is cited in the changelog: this is always true when a changelog is needed, right?

Yes, this is correct.

.github/workflows/check_changelog.yml Outdated Show resolved Hide resolved
@cmarmo
Copy link
Contributor Author

cmarmo commented Jan 13, 2021

What do you think about this suggestion?

I like it , but I'm under the impression that labels are still not systematically used in the scikit-learn issue tracker, despite the current effort of automation. I would rather to keep the workflow simple for now: it could always be improved in the future.
I think you are right about continuous integration ("Build-CI"), we can safely skip it, probably "Documentation" too.

@NicolasHug
Copy link
Member

I thing we could reduce the number of false "negative" by maintaining a blacklist and whitelist of labels to enable or disable it:

I'm personally more concerned about false positive than false negative. Quite frankly, I don't think there are tons of occurrences where we forgot to add a changelog entry.
I also agree with @cmarmo that we don't use labels consistently enough in the PRs ATM, so having all the proposed logic above might be over-engineering for little gain

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok let's keep it simple then.

I am fine with merging this as it is, unless @cmarmo would like to implement the version check thingy (https://github.com/scikit-learn/scikit-learn/pull/19155/files#r556562785).

@cmarmo
Copy link
Contributor Author

cmarmo commented Jan 15, 2021

I am fine with merging this as it is, unless @cmarmo would like to implement the version check thingy (https://github.com/scikit-learn/scikit-learn/pull/19155/files#r556562785).

Let me try to implement it : I think it is a useful test... at least this will force to move the milestones if they are obsoletes... and there are some cases in the issue tracker... :)
In the meanwhile may I ask to create the "no-changelog" label? I don't have rights for that. Thanks!

@ogrisel
Copy link
Member

ogrisel commented Jan 15, 2021

I created the new label as No Changelog Needed to be a bit more explicit and more consistent with the case style of the existing labels (even if we are not that consistent).

Can you please update the PR accordingly (both for the skip condition and the user displayed to the users in case of failure of the check).

.github/workflows/check-changelog.yml Outdated Show resolved Hide resolved
.github/workflows/check-changelog.yml Outdated Show resolved Hide resolved
.github/workflows/check-changelog.yml Outdated Show resolved Hide resolved
.github/workflows/check-changelog.yml Outdated Show resolved Hide resolved
then
echo "Changelog has been updated."
check_mlstn=$(cat ./doc/whats_new/v${MILESTONE_TITLE:0:4}.rst)
if [[ "$check_mlstn" =~ :pr:\`$PR_NUMBER\` ]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what happens if there's no mileston label on the PR?

instead of checking all changelogs first and then the specific changelog, could we just check

  • all changelogs if there's no label
  • the expected changelog if there's a label?

Copy link
Contributor Author

@cmarmo cmarmo Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to prefer the current solution for the following reason: I have seen PRs mile-stoned for the r+2 release while r+1 was under development.
Assuming that the changelog entry has been written in r+1

  • current implementation: you will be notified that the entry has been written but there is a mis-versioning (edited: not really, because the changelog is not created yet)... but the check will be green anyway.
  • your suggestion: the check will fail. You should verify yourself if the entry is there.

I'm assuming that the furthest milestone has been assigned to notify low priority or to not put pressure ... I don't know... but as I've seen this quite often in the issue tracker I'm not comfortable in telling to people "don't do that" as long as an automatic check is affordable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all... sorry, the furthest changelog is not present.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the check will be green anyway

I'm a bit confused because as far as I understand the current check, it will fail if the entry isn't written in the corresponding changelog. Which is why it seems unnecessary to check all changelogs first

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm affected by a confirmation bias here ... :(
Another quite common case is an old milestone in a new PR:

  • current implementation: the entry is there, the check will tell you that the milestone is wrong
  • your suggestion: there is no new entry in the old changelog.

Again, I know that those cases should not happen, but they happen...

Copy link
Member

@NicolasHug NicolasHug Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC your point is that the current version will provide a more informative error message? Fair point. I still find the proposed structure below easier to follow (and more efficient, which doesn't really matter anyway). Maybe we can just amend the general error message saying

This PR is modifying tests and we thus expect a changelog entry. If no changelog entry is needed, label the PR with no changelog needed. If you see this error and there is already a changelog entry then make sure that a) the PR number is correct, b) the tagged milestone for the PR (if any) and the changelog name properly match.

WDYT?

.github/workflows/check-changelog.yml Outdated Show resolved Hide resolved
@NicolasHug
Copy link
Member

NicolasHug commented Jan 15, 2021

Basically here's what would seem natural to me:

if not tests are modified
	exit gracefully

If PR is milestoned: 
	if corresponding CL file doesn't exist:
		exit gracefully as there's probably nothing to do
	expected_CL = <that specific changelog>
else:
	expected_CL = <all changelogs>

<now ensure that the PR number is in expected_CL. If it fails, error with general informative error message>

I could be wrong but I believe that this is what the PR is currently doing, although it does not follow the above structure

@cmarmo
Copy link
Contributor Author

cmarmo commented Jan 15, 2021

This PR does the following:

  • skip the test if the label no-changelog is present
  • if tests files are not modified exits gracefully
  • look for the PR number in all changelogs: if no entry exit 1
  • check if a file correspondent to the PR milestone exists: if not exit gracefully
  • if yes check that if the PR is referenced there: if not exit 1 because of the mismatch

As I said before (our comments crossed) I prefer keep this way because it allows me to check mismatch between changelog and milestones as they are common in the issue tracker. But to be honest I'm unable to test all the use cases and I would rather to see it working to verify its behavior.

@NicolasHug
Copy link
Member

Sorry if I'm missing something but I think this description is strictly equivalent to my proposal in terms of behavior: #19155 (comment)

The only difference I see is in terms of error message opportunity, for which I proposed a general solution in #19155 (comment)

@NicolasHug
Copy link
Member

The only difference I see is in terms of error message opportunity, for which I proposed a general solution in #19155 (comment)

To clarify: I'm proposing to implement the new error message + #19155 (comment), not just the new error messages alone. The new error message proposal is so that the messages become as informative as they are in the current version

If the proposed structure leads to a different behavior then please let me know, but otherwise I think it will be simpler, cleaner, and easier to maintain.

@cmarmo
Copy link
Contributor Author

cmarmo commented Jan 15, 2021

If the proposed structure leads to a different behavior then please let me know, but otherwise I think it will be simpler, cleaner, and easier to maintain.

I'm quite convinced the current structure is different from your suggestion: that's why I needed to split the message in two parts.
If the PR is milestoned r+2 (for any reason) the file doesn't exist, if I understand correctly:

  • current implementation: the check will verify if there is an entry in all changelog (containing r+1), then tell you that there is a mismatch.
  • your suggestion: the check is not performed

Again, this is maybe not an expected use case but it happened already (in particular with the 1.0 version). Maybe this will not happen in the future but I can't be sure of that.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, thanks for re-clarifying.

It is indeed an edge case and the fix for that use-case would be to update the PR's tag, not the entry (which is why it got me confused)... but OK. I made some last comments, maybe @ogrisel can make a last pass and merge

Thanks @cmarmo

.github/workflows/check-changelog.yml Outdated Show resolved Hide resolved
.github/workflows/check-changelog.yml Outdated Show resolved Hide resolved
set -xe
changed_files=$(git diff --name-only origin/master)
# Changelog should be updated only if tests have been modified
if [[ "$changed_files" =~ tests ]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still suggest to exit early here if no tests are changed: this saves one nested block, which helps readability (especially in bash ;) )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen your previous comment and and replied there: I'm sorry but I can't understand what you are suggesting.
Should I add an else with exit 0? The end of this if is the end of the script, I guess if the check is False there is nothing else to be executed...

Copy link
Member

@NicolasHug NicolasHug Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed your reply up there. Basically I'm suggesting to do

if [[ ! "$changed_files" =~ tests ]]  # added the "!" here
then
	exit 0
fi  # <- move very last fi here

# continue as before but everything indented left

Base automatically changed from master to main January 22, 2021 10:53
@ogrisel
Copy link
Member

ogrisel commented Feb 23, 2021

I did another pass. Let's try this for real and we will see if we need to adjust the conditions or message based on contributors feedback in PRs.

@ogrisel ogrisel merged commit fab739c into scikit-learn:main Feb 23, 2021
@ogrisel
Copy link
Member

ogrisel commented Feb 23, 2021

Merged!

@cmarmo cmarmo deleted the changelog-check branch February 23, 2021 17:23
jnothman added a commit that referenced this pull request Feb 24, 2021
jnothman added a commit to thomasjpfan/scikit-learn that referenced this pull request Feb 24, 2021
really just an innocuous commit to check scikit-learn#19155
@jnothman
Copy link
Member

I pushed a couple of small fixes: f2943c6 and 86445ab

Where did we decide on the logic that we should check all the changelogs? I might have thought it good to error if a PR added an entry to already released changelog, without being tagged with a corresponding milestone.

@jnothman
Copy link
Member

I also note that with set -x we're printing the entire changelog to the Github Check logs.

@cmarmo
Copy link
Contributor Author

cmarmo commented Feb 24, 2021

Where did we decide on the logic that we should check all the changelogs? I might have thought it good to error if a PR added an entry to already released changelog, without being tagged with a corresponding milestone.

This is indeed what happens: if a PR is referenced in a previous changelog without being tagged the check errors. As milestones are not mandatory for now I can't check only the milestone changelog but I need to look for the PR everywhere. To fix the error the contributor should add or move the entry, or the team should change the milestone (this can be useful for minor releases), if any.
I think that making the milestone mandatory would be great.... but I don't think this is doable for now.

@jnothman
Copy link
Member

if a PR is referenced in a previous changelog without being tagged the check errors.

I don't see this logic in the code. If a PR is referenced in a previous changelog, and the PR is not tagged with a milestone, then the check passes, regardless of which changelog the PR edits. I am suggesting, rather, that the PR enforces that the changelog entry is in the latest version's logs.

I think that making the milestone mandatory would be great....

Perhaps with automation? Previously it's not been that helpful except for timely issues.

@glemaitre glemaitre mentioned this pull request Apr 22, 2021
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Have Travis ensure that what's new is updated
4 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.