Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[ENH] V1 → V2 API Migration - studies#1610

Open
rohansen856 wants to merge 275 commits intoopenml:mainopenml/openml-python:mainfrom
rohansen856:studies-migrationrohansen856/openml-python:studies-migrationCopy head branch name to clipboard
Open

[ENH] V1 → V2 API Migration - studies#1610
rohansen856 wants to merge 275 commits intoopenml:mainopenml/openml-python:mainfrom
rohansen856:studies-migrationrohansen856/openml-python:studies-migrationCopy head branch name to clipboard

Conversation

@rohansen856
Copy link
Contributor

Metadata

Details

Stackend PR, Depends on #1576

This PR adds Studies v2 migration.

A question:
Due to the pre commit hook i could not put 6 arguments in a function, so i had to workaround that with this instead:
openml_api\resources\studies.py (line 10-15)

        limit = kwargs.get("limit")
        offset = kwargs.get("offset")
        status = kwargs.get("status")
        main_entity_type = kwargs.get("main_entity_type")
        uploader = kwargs.get("uploader")
        benchmark_suite = kwargs.get("benchmark_suite")

I would like to confirm if this approach is correct or not. Raising a draft PR for now.

@codecov-commenter
Copy link

codecov-commenter commented Jan 8, 2026

Codecov Report

❌ Patch coverage is 23.91304% with 35 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.64%. Comparing base (e653ef6) to head (ca2cdc5).

Files with missing lines Patch % Lines
openml/_api/resources/study.py 23.25% 33 Missing ⚠️
openml/study/functions.py 33.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1610      +/-   ##
==========================================
- Coverage   54.67%   54.64%   -0.04%     
==========================================
  Files          63       63              
  Lines        5108     5124      +16     
==========================================
+ Hits         2793     2800       +7     
- Misses       2315     2324       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@geetu040 geetu040 mentioned this pull request Jan 9, 2026
18 tasks
@rohansen856
Copy link
Contributor Author

Implementing noqa instead of the kwargs following example from here: openml\testing.py:

    def _check_fold_timing_evaluations(  # noqa: PLR0913
        self,
        fold_evaluations: dict[str, dict[int, dict[int, float]]],
        num_repeats: int,
        num_folds: int,
        *,
        max_time_allowed: float = 60000.0,
        task_type: TaskType = TaskType.SUPERVISED_CLASSIFICATION,
        check_scores: bool = True,
    ) -> None:

Final function signature:

    def list(  # noqa: PLR0913
        self,
        limit: int | None = None,
        offset: int | None = None,
        status: str | None = None,
        main_entity_type: str | None = None,
        uploader: list[int] | None = None,
        benchmark_suite: int | None = None,
    ) -> Any:

Signed-off-by: rohansen856 <rohansen856@gmail.com>
@rohansen856 rohansen856 marked this pull request as ready for review January 13, 2026 07:21
Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work. Just use the listing as suggested in #1575 (comment) which is already similar to what you have done.

@rohansen856
Copy link
Contributor Author

@geetu040 I reviewed the specific changes needed and have a slight doubt in the pandas implementation.
So as i undertand, i need to use pandas Dataframe insteaf of ANY in openml\_api\resources\base.py like this:

class StudiesAPI(ResourceAPI, ABC):
    @abstractmethod
    def list(  # noqa: PLR0913
        self,
        limit: int | None = None,
        offset: int | None = None,
        status: str | None = None,
        main_entity_type: str | None = None,
        uploader: list[int] | None = None,
        benchmark_suite: int | None = None,
    ) -> pd.DataFrame: ...

and similarly i have to change the return object in openml\_api\resources\studies.py from this:return response.text
to this:

xml_string = response.text

        # Parse XML and convert to DataFrame
        study_dict = xmltodict.parse(xml_string, force_list=("oml:study",))

        # Minimalistic check if the XML is useful
        assert isinstance(study_dict["oml:study_list"]["oml:study"], list), type(
            study_dict["oml:study_list"],
        )
        assert (
            study_dict["oml:study_list"]["@xmlns:oml"] == "http://openml.org/openml"
        ), study_dict["oml:study_list"]["@xmlns:oml"]

        studies = {}
        for study_ in study_dict["oml:study_list"]["oml:study"]:
            # maps from xml name to a tuple of (dict name, casting fn)
            expected_fields = {
                "oml:id": ("id", int),
                "oml:alias": ("alias", str),
                "oml:main_entity_type": ("main_entity_type", str),
                "oml:benchmark_suite": ("benchmark_suite", int),
                "oml:name": ("name", str),
                "oml:status": ("status", str),
                "oml:creation_date": ("creation_date", str),
                "oml:creator": ("creator", int),
            }
            study_id = int(study_["oml:id"])
            current_study = {}
            for oml_field_name, (real_field_name, cast_fn) in expected_fields.items():
                if oml_field_name in study_:
                    current_study[real_field_name] = cast_fn(study_[oml_field_name])
            current_study["id"] = int(current_study["id"])
            studies[study_id] = current_study

        return pd.DataFrame.from_dict(studies, orient="index")

A total of 3 files would be affected: openml\_api\resources\base.py, openml\_api\resources\studies.py and openml\study\functions.py

Can you please confirm my approach... After that i will update the PR.

@geetu040
Copy link
Collaborator

@rohansen856 yes sounds right

@rohansen856
Copy link
Contributor Author

Updated! Ready for review.

Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost fine, just complety remove _list_studies as well and replace _list_studies with api_context.backend.studies.list as the parameter for partial in list_studies. Hope I didnot confuse you, just search for the exact method names in code. Let me know if I am not clear enough.

@rohansen856
Copy link
Contributor Author

rohansen856 commented Jan 16, 2026

Almost fine, just complety remove _list_studies as well and replace _list_studies with api_context.backend.studies.list as the parameter for partial in list_studies. Hope I didnot confuse you, just search for the exact method names in code. Let me know if I am not clear enough.

Oh definitely! I prolly missed that in openml\study\functions.py but pushing the change with next commit.

Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update with #1576 (comment)

Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not synced with the base pr #1576 that's why the tests are failing.

@geetu040
Copy link
Collaborator

Please add in description Fixes 1594

Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base PR is merged now, please sync with main.

Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done overall, I am a bit skeptical about the overuse of mocking, since it adds alot of hardcoded response content, but let's see if Pieter thinks otherwise, when he reviews.

main_entity_type: str | None = None,
uploader: list[int] | None = None,
benchmark_suite: int | None = None,
) -> pd.DataFrame:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use abstractmethod and I'd say remove the docstring and simply use the placeholder ... inplace of NotImplementedError

main_entity_type: str | None = None,
uploader: builtins.list[int] | None = None,
benchmark_suite: int | None = None,
) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't use self so it's better to make this a static method, apply the same for other private methods in this file


result = study_v1.delete(study_id)

assert result is True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert result is True
assert result

Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also fix the failing tests, you probably just need to fix the fixture, see other test files for reference

Signed-off-by: rohansen856 <rohansen856@gmail.com>
Signed-off-by: rohansen856 <rohansen856@gmail.com>
Signed-off-by: rohansen856 <rohansen856@gmail.com>
Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, looks good now.

@PGijsbers please review/merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.