Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

MAINT Use towncrier for changelog management #30046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Oct 16, 2024

Conversation

lesteve
Copy link
Member

@lesteve lesteve commented Oct 11, 2024

What does this implement/fix? Explain your changes.

This is a step towards using towncrier for easier management of our changelog.

The config and template are in place, so feed-back is more than welcome. I have added a short README.rst that explains how to create a fragment file (single file with a naming convention like <PR_NUMBER>.<TYPE>.rst that you need to add in the right folder for each PR to describe the change).

I added a few fragment files that I created manually to make sure that the template was working correctly and to give an idea how it would look.

A few choices I have made:

  • I added sklearn.<module> folders with .gitkeep files for all the subfolders (sklearn.linear_model, sklearn.tree, etc ...) and python files (sklearn.base, sklearn.calibration, etc ....), except for private modules or submodules I deemed not relevant (e.g. sklearn.experimental, sklearn.tests, etc ...)
  • there are a few usual top-level sections that I haved added based on recent changelog, "Security", "Changes affecting many modules", "Suport for Array API" etc ...
  • for all the rest there is custom-top-level folder where you can add custom section. There is an example with dropping setuptools support. For these sections you don't write a bullet point but a section so you need to use the right headline character which is - currently.

Fow now, I am envisioning that you use towncrier to generate a single file from the fragment files but that there is still some manual copy and paste + possibly light edit to add the content to doc/whats_new/v1.6.rst but this is up for discussion/refinement.

This is inspired from the numpy and astropy setups and I also tried to have a changelog similar to what we currently have (ordered first by submodules then by category).

To try it out:

towncrier build --version 1.6.0 --draft

Should produce something like this:

Version 1.6.0 (2024-10-11)
==========================

Changes impacting many modules
------------------------------
- |API| :func:`utils.validation.validate_data` is introduced and replaces previously
  private `base.BaseEstimator._validate_data` method. This is intended for third party
  estimator developers, who should use this function in most cases instead of
  :func:`utils.validation.check_array` and :func:`utils.validation.check_X_y`.
  `Adrin Jalali`_. :pr:`29696`

Dropping support for building with setuptools
---------------------------------------------

From scikit-learn 1.6 onwards, support for building with setuptools has been
removed. Meson is the only supported way to build scikit-learn, see
:ref:`Building from source <install_bleeding_edge>` for more details.

:user:`Loïc Estève <lesteve>` :pr:`29400`

:mod:`sklearn.ensemble`
-----------------------
- |Feature| :class:`ensemble.ExtraTreesClassifier` and
  :class:`ensemble.ExtraTreesRegressor` now support missing-values in the data matrix
  `X`. Missing-values are handled by randomly moving all of the samples to the left, or
  right child node as the tree is traversed.
  :user:`Adam Li <adam2392>`. :pr:`28268`

- |Efficiency| Small runtime improvement of fitting
  :class:`ensemble.HistGradientBoostingClassifier` and
  :class:`ensemble.HistGradientBoostingRegressor` by parallelizing the initial search
  for bin thresholds.
  by :user:`Christian Lorentzen <lorentzenchr>`. :pr:`28064`

- |Efficiency| :class:`ensemble.IsolationForest` now runs parallel jobs
  during :term:`predict` offering a speedup of up to 2-4x on sample sizes
  larger than 2000 using `joblib`.
  :user:`Adam Li <adam2392>` and
  :user:`Sérgio Pereira <sergiormpereira>`. :pr:`28622`

- |Enhancement| The verbosity of :class:`ensemble.HistGradientBoostingClassifier`
  and :class:`ensemble.HistGradientBoostingRegressor` got a more granular control. Now,
  `verbose = 1` prints only summary messages, `verbose >= 2` prints the full
  information as before.
  :user:`Christian Lorentzen <lorentzenchr>`. :pr:`28179`

- |API| The parameter `algorithm` of :class:`ensemble.AdaBoostClassifier` is deprecated
  and will be removed in 1.8.
  :user:`Jérémie du Boisberranger <jeremiedbb>`. :pr:`29997`

:mod:`sklearn.linear_model`
---------------------------
- |Fix| :class:`linear_model.LogisticRegressionCV` corrects sample weight handling
  for the calculation of test scores.
  :user:`Shruti Nath <snath-xoc>`. :pr:`29419`

- |Fix| :class:`linear_model.LassoCV` and :class:`linear_model.ElasticNetCV` now
  take sample weights into accounts to define the search grid for the internally tuned
  `alpha` hyper-parameter. :user:`John Hopfensperger <s-banach> and
  :user:`Shruti Nath <snath-xoc>`. :pr:`29442`

- |API| Deprecates `copy_X` in :class:`linear_model.TheilSenRegressor` as the parameter
  has no effect. `copy_X` will be removed in 1.8.
  :user:`Adam Li <adam2392>`. :pr:`29105`

:mod:`sklearn.tree`
-------------------
- |Feature| :class:`tree.ExtraTreeClassifier` and :class:`tree.ExtraTreeRegressor` now
  support missing-values in the data matrix ``X``. Missing-values are handled by
  randomly moving all of the samples to the left, or right child node as the tree is
  traversed.
  :user:`Adam Li <adam2392>`. :pr:`27966`

Any other comments?

There are some places where our changelog is a bit custom (notably "Support for Array API" where the ordering is manually done by functions, then by classes, then Other). This kind of flexibility would not be available through towncrier.

My experience writing (and reading) the towncrier template (that uses jinja2) is not very pleasant so I would avoid doing fancy stuff there unless really necessary.

There are things that I haven't added yet but may be worth it, like adding some description before the bullet points for some sections. We currently do this for Array API and Metadata routing to give a bit of context and a link for more details, see https://scikit-learn.org/dev/whats_new/v1.6.html#support-for-array-api.

@lesteve lesteve changed the title MAINT First MAINT Use towncrier for changelog management Oct 11, 2024
Copy link

github-actions bot commented Oct 11, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: c0d203b. Link to the linter CI: here

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this overall, but one thing I'd change is that having so many folders mirroring our modules kinda looks / feels dirty. I'd rather have the file name specify that as <PR#>.<TYPE>.<SECTION>.rst instead.

We should also fix our linter checking for changelog, and might also want a linter to check the file name itself.

@lesteve
Copy link
Member Author

lesteve commented Oct 14, 2024

I like this overall

🎉

I'd rather have the file name specify that as <PR#>.<TYPE>.<SECTION>.rst instead.

I don't think this is possible with towncrier unfortunately, feel free to have a look at the doc in case I missed something ...

We should also fix our linter checking for changelog, and might also want a linter to check the file name itself.

There is https://github.com/scientific-python/action-towncrier-changelog that does most of it I think. I have started testing it this morning in lesteve#37 and it looks good enough for our purposes.

@glemaitre glemaitre self-requested a review October 14, 2024 13:05
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We're most probably gonna need to tune / fix a few things during the release process anyway.

@glemaitre glemaitre merged commit d76b75b into scikit-learn:main Oct 16, 2024
30 checks passed
@thomasjpfan
Copy link
Member

@lesteve lesteve deleted the towncrier branch October 17, 2024 06:02
@lesteve
Copy link
Member Author

lesteve commented Oct 17, 2024

Does this mean we can simplify the code in main/.github/workflows/check-changelog.yml ?

I don't know about simplifying but the CI part will need to be adapted. As mentioned in #30046 (comment) there is scientific-python/action-towncrier-changelog that we can use. I have started testing in lesteve#37 and it looks good enough for our purposes.

I think we still have some custom logic that says if no tests were changed then a changelog is not required. If we want to keep this logic, this needs to be implemented I guess as a first step in the CI job.

@StefanieSenger
Copy link
Contributor

I want to point out that these categories that are listed in the changelog instructions:

    major-feature
    feature
    efficiency
    enhancement
    fix
    api

are now playing a much bigger role than ever. Before, we simply put [ENH] or [MNT] in front of the change and the changes were not sortable by these categories. Now they are, if I understand correctly.

But these are not really good categories, they should not overlap, they should cover everything.

Major concerns:

  1. api will always be affected with major-feature and feature and maybe enhancement. It is ambiguous for us where to put these and possibly confusing to users.
  2. There is not really a good place to put maintenance PRs and people would probably put them into enhancement (which it is not) and fix (which it is not).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.