-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
MAINT fragmenting the changelog of 1.6 #30081
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT fragmenting the changelog of 1.6 #30081
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trusting you've already made sure nothing's removed from the changelog. Can't really easily check for that here. I'll let @lesteve have a look too and merge.
I moved each entry by hand with copy pasting. So no entry could have been lost. Tedious but this is done.
|
Turns out I had a hacky script almost ready to generate the fragments. At least this allowed to double-check that no entry were missing. There are some tiny differences due to different choices (or my script being too dumb) but I think that's fine. Here is my hacky script in case it can be used later down the line: """
Caveats right now (2024-10-16):
- custom sections like "dropping setuptools support" are not seen (because there
is no bullet points inside these section, easy to do by hand, there are 2 of them)
- Array API had custom grouping (by class, function, other), so needs to be
looked at closer. Needs content edition. Also everything is categorized
other, do we want different tags (maybe feature)?
- 2 entries have two PRs listed 29677 (other PR is 22606) 29143 (other PR is
27736), probably the content needs to be tweaked there is "By" or/and "and" too many
"""
import re
import textwrap
import warnings
from pathlib import Path
from docutils import nodes
from docutils.core import publish_doctree
from docutils.utils import Reporter
def parse_rst(rst_content):
doctree = publish_doctree(
# settings_override is used to avoid showing parsing issues we don't
# care about, for example for sphinx constructs
rst_content,
settings_overrides={"report_level": Reporter.SEVERE_LEVEL + 10},
)
return extract_sections_and_bullets(doctree)
def extract_sections_and_bullets(node, level=1):
result = []
if isinstance(node, nodes.section):
title = node.children[0].rawsource
result.append({"type": "section", "level": level, "title": title})
for child in node.children[1:]:
result.extend(extract_sections_and_bullets(child, level + 1))
elif isinstance(node, nodes.bullet_list):
for item in node.children:
bullet_text = item.children[0].rawsource
result.append({"type": "bullet", "text": bullet_text})
elif isinstance(node, nodes.Element):
for child in node.children:
result.extend(extract_sections_and_bullets(child, level))
return result
def section_to_folder(section):
if section.startswith(":mod:"):
return re.sub(r":mod:`(.+)`", r"\1", section)
section_mapping = {
"Changes impacting many modules": "many-modules",
"Support for Array API": "array-api",
"Metadata Routing": "metadata-routing",
}
return section_mapping[section]
def get_pr_number(content):
matches = re.findall(pr_pattern, content)
if len(matches) > 1:
warnings.warn(f"More than one PR {matches} in content {content}")
return matches[-1]
def get_fragment_type(content):
m = re.match(tag_pattern, content)
if m is None:
return "other"
return tag_to_type[m.group(1)]
def get_fragment_path(section, content):
pr_number = get_pr_number(content)
fragment_type = get_fragment_type(content)
subfolder = section_to_folder(section)
root_folder = Path("doc/whats_new/upcoming_changes")
return root_folder / subfolder / f"{pr_number}.{fragment_type}.rst"
def get_fragment_content(content):
# need to strip spaces hence + r"\s*" in two lines below
content = re.sub(tag_pattern + r"\s*", "", content)
content = re.sub(pr_pattern + r"\s*", "", content)
# Some people use shorthands rather than :user: ...
user_pattern = r"by(\s+)(:user:`[^`]+`|`[\w ]+`_)"
content = re.sub(user_pattern, r"By\1\2", content)
# Need to indent and add the bullet point
content = textwrap.indent(content, " " * 2)
return f"-{content[1:]}"
changelog = Path("~/dev/scikit-learn/doc/whats_new/v1.6.rst").expanduser()
# Remove includes which can cause errors and are not necessary for our purposes
content = re.sub(r"\.\. include.+", "", changelog.read_text())
parsed_content = parse_rst(content)
current_section = None
content_by_section = {}
for item in parsed_content:
if item["type"] == "section":
current_section = item["title"]
elif item["type"] == "bullet":
content_by_section.setdefault(current_section, []).append(item["text"])
pr_pattern = r":pr:`(\d+)`"
tag_to_type = {
"MajorFeature": "major-feature",
"Feature": "feature",
"Efficiency": "efficiency",
"Enhancement": "enhancement",
"Fix": "fix",
"API": "api",
}
joined_tags = "|".join(tag_to_type)
tag_pattern = rf"\|({joined_tags})\|"
for section, content_list in content_by_section.items():
for content in content_list:
path = get_fragment_path(section, content)
path.parent.mkdir(exist_ok=True)
path.write_text(get_fragment_content(content)) |
OK let's merge this one! I guess there are going to be conflicts in PR that updates |
now the PAIN starts 😁 |
What do you think about adding a link to https://github.com/scikit-learn/scikit-learn/blob/main/doc/whats_new/upcoming_changes/README.md in People who have to fix |
I was thinking the same. Even having a bunch of the important content of the readme in the 1.6 file to make it easier than having to open yet another link. |
I'm okay with that but I would mean updating 2 places if we needed to change things. |
It might be worth it, I personally end up having to spend more time than I like everytime I add a changelog entry. |
Follow-up of #30046
Fragmenting the changelog of 1.6 for each entry.
A follow-up to this PR is to activate the proper GitHub actions to check the fragment on PR.
For reference, the current fragmentation will generate the following RST file: