[ENH] V1 -> V2 Migration : Runs#1616
[ENH] V1 -> V2 Migration : Runs#1616Omswastik-11 wants to merge 280 commits intoopenml:mainopenml/openml-python:mainfrom Omswastik-11:runs-migration-stackedOmswastik-11/openml-python:runs-migration-stackedCopy head branch name to clipboard
Conversation
Signed-off-by: Omswastik-11 <omswastikpanda11@gmail.com>
…into issue1564
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1616 +/- ##
==========================================
- Coverage 53.96% 53.11% -0.86%
==========================================
Files 61 62 +1
Lines 5051 5202 +151
==========================================
+ Hits 2726 2763 +37
- Misses 2325 2439 +114 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Omswastik-11 <omswastikpanda11@gmail.com>
…-11/openml-python into runs-migration-stacked
There was a problem hiding this comment.
Please take a look at the CI failures, they look genuine. I think you need to update the tests to use the expected content from local database.
I think this is the reason: #1611 (review)
…into runs-migration-stacked
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 44 out of 47 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from urllib.parse import urlparse | ||
|
|
There was a problem hiding this comment.
urlparse is imported but never used in this test module. This will fail linting in setups that enforce unused imports; please remove it or use it.
| def test_get_production_servers(): | ||
| assert openml.config.get_production_servers() == openml.config._get_servers("production") | ||
|
|
||
|
|
||
| def test_get_test_servers(): | ||
| assert openml.config.get_test_servers() == openml.config._get_servers("test") | ||
|
|
||
|
|
||
| def test_use_production_servers(): | ||
| openml.config.use_production_servers() | ||
| servers_1 = openml.config.servers | ||
|
|
||
| openml.config._set_servers("production") | ||
| servers_2 = openml.config.servers | ||
|
|
||
| assert servers_1 == servers_2 | ||
|
|
||
|
|
||
| def test_use_test_servers(): | ||
| openml.config.use_test_servers() | ||
| servers_1 = openml.config.servers | ||
|
|
||
| openml.config._set_servers("test") | ||
| servers_2 = openml.config.servers |
There was a problem hiding this comment.
These tests pass string literals ("production"/"test") into _get_servers/_set_servers, but those methods are typed/implemented to take ServerMode and will currently raise (or depend on implicit coercion). Use ServerMode.PRODUCTION/ServerMode.TEST here (or update the implementation to accept strings consistently).
| V2 server does not yet support POST /runs/ endpoint. | ||
| Expected availability: Q2 2025 |
There was a problem hiding this comment.
The docstring says the V2 run publish endpoint is expected in “Q2 2025”, which is now in the past. Please update/remove the date to avoid misleading users (e.g., point to an issue/roadmap link instead of a time estimate).
| V2 server does not yet support POST /runs/ endpoint. | |
| Expected availability: Q2 2025 | |
| V2 server does not yet support the POST /runs/ endpoint. | |
| See the OpenML roadmap or issue tracker for the current status. |
| def _get_servers(mode: ServerMode) -> dict[APIVersion, dict[str, str | None]]: | ||
| if mode not in ServerMode: | ||
| raise ValueError(f'invalid mode="{mode}" allowed modes: {", ".join(list(ServerMode))}') | ||
| return deepcopy(_SERVERS_REGISTRY[mode]) |
There was a problem hiding this comment.
_get_servers only accepts ServerMode enum values, but it’s being used with strings (e.g., "production"/"test") in the tests and will raise ValueError because mode not in ServerMode for a plain string. Consider accepting str | ServerMode here (e.g., coerce via ServerMode(mode) when isinstance(mode, str)), or update all call sites to pass ServerMode.PRODUCTION/ServerMode.TEST consistently.
| def _get_servers(mode: ServerMode) -> dict[APIVersion, dict[str, str | None]]: | |
| if mode not in ServerMode: | |
| raise ValueError(f'invalid mode="{mode}" allowed modes: {", ".join(list(ServerMode))}') | |
| return deepcopy(_SERVERS_REGISTRY[mode]) | |
| def _get_servers(mode: ServerMode | str) -> dict[APIVersion, dict[str, str | None]]: | |
| if isinstance(mode, str): | |
| try: | |
| mode = ServerMode(mode) | |
| except ValueError as exc: | |
| allowed = ", ".join(m.value for m in ServerMode) | |
| raise ValueError(f'invalid mode="{mode}" allowed modes: {allowed}') from exc | |
| elif mode not in ServerMode: | |
| allowed = ", ".join(m.value for m in ServerMode) | |
| raise ValueError(f'invalid mode="{mode}" allowed modes: {allowed}') | |
| return deepcopy(_SERVERS_REGISTRY[cast(ServerMode, mode)]) |
| return self.servers[self.api_version]["apikey"] | ||
|
|
||
| @apikey.setter | ||
| def apikey(self, value: str | None) -> None: |
There was a problem hiding this comment.
The apikey setter no longer enforces the previous type restriction (string or None). Since this is part of the public configuration surface, it’s easy for callers to accidentally set non-strings (e.g., dict/int) and only fail later during request building. Consider restoring a simple type check in the setter to fail fast with a clear TypeError.
| def apikey(self, value: str | None) -> None: | |
| def apikey(self, value: str | None) -> None: | |
| if value is not None and not isinstance(value, str): | |
| raise TypeError( | |
| f"apikey must be a string or None, got {type(value).__name__}: {value!r}" | |
| ) |
| openml.config.server = "temp-server1" | ||
| openml.config.apikey = "temp-apikey1" | ||
| openml.config._get_servers(mode)["server"] = 'temp-server2' | ||
| openml.config._get_servers(mode)["apikey"] = 'temp-server2' |
There was a problem hiding this comment.
This test is mutating the dict returned by _get_servers(mode) using keys "server"/"apikey", but _get_servers returns a mapping keyed by APIVersion, so these assignments will raise KeyError. If the intent is to verify the returned value is a deep copy, mutate a nested entry (e.g., servers_copy[api_version]["server"] = ...) and then assert a fresh _get_servers(mode) call still equals orig_servers.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 12 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ------ | ||
| OpenMLNotSupportedError | ||
| V2 server does not yet support POST /runs/ endpoint. | ||
| Expected availability: Q2 2025 |
There was a problem hiding this comment.
The docstring claims an "Expected availability: Q2 2025" for the V2 publish endpoint, which is now in the past and likely to become increasingly misleading. Consider removing the date or replacing it with a timeless note (e.g., link to an issue/roadmap entry) so the message stays accurate over time.
| Expected availability: Q2 2025 |
| import time | ||
| import unittest | ||
| import warnings | ||
| from urllib.parse import urljoin |
There was a problem hiding this comment.
urljoin is imported but not used anywhere in this test module. Please remove the import (or use it) to avoid lint/test failures in environments enforcing unused-import checks.
| from urllib.parse import urljoin |
| @mock.patch.object(requests.Session, "request") | ||
| def test_delete_run_not_owned(mock_delete, test_files_directory, test_server_v1, test_apikey_v1): | ||
| content_file = test_files_directory / "mock_responses" / "runs" / "run_delete_not_owned.xml" | ||
| mock_delete.return_value = create_request_response( |
There was a problem hiding this comment.
The patched method is requests.Session.request, but the mock parameter is still named mock_delete, which is misleading. Rename the argument to something like mock_request (and update local variable names) to reflect what’s being mocked.
| ignore_cache : bool | ||
| Whether to ignore the cache. If ``true`` this will download and overwrite the run xml | ||
| even if the requested run is already cached. | ||
|
|
There was a problem hiding this comment.
The get_run docstring repeats the ignore_cache parameter section twice. Please remove the duplicate block so the docstring stays accurate and easier to maintain.
| ignore_cache : bool | |
| Whether to ignore the cache. If ``true`` this will download and overwrite the run xml | |
| even if the requested run is already cached. |
| f'"http://openml.org/openml": {runs_dict}', | ||
| ) | ||
|
|
||
| assert isinstance(runs_dict["oml:runs"]["oml:run"], list), type(runs_dict["oml:runs"]) |
There was a problem hiding this comment.
assert isinstance(..., list) is used for validating the parsed XML shape. Asserts can be disabled with Python optimizations (-O), which would skip this check and potentially lead to harder-to-debug errors later. Prefer raising a TypeError/ValueError (similar to openml.runs.functions.__list_runs) instead of using assert for input validation.
| assert isinstance(runs_dict["oml:runs"]["oml:run"], list), type(runs_dict["oml:runs"]) | |
| if not isinstance(runs_dict["oml:runs"]["oml:run"], list): | |
| raise TypeError( | |
| "Error in return XML, 'oml:runs/oml:run' is expected to be a list, " | |
| f"but got {type(runs_dict['oml:runs']['oml:run']).__name__}: " | |
| f"{runs_dict['oml:runs']['oml:run']}" | |
| ) |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 12 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| assert isinstance(runs_dict["oml:runs"]["oml:run"], list), type(runs_dict["oml:runs"]) | ||
|
|
There was a problem hiding this comment.
_parse_list_xml uses a bare assert to validate that oml:run is a list. Assertions can be stripped with python -O, and even when enabled they raise AssertionError (less actionable for API consumers). Prefer an explicit isinstance(..., list) check that raises TypeError/ValueError with a clear message (similar to openml.runs.functions.__list_runs).
| try: | ||
| run_v1.get(run_id=TEST_RUN_ID) | ||
| return TEST_RUN_ID | ||
| except Exception: | ||
| runs_df = run_v1.list(limit=1, offset=0) | ||
| if runs_df.empty: | ||
| pytest.skip("No runs available on configured test server") | ||
| return int(runs_df.iloc[0]["run_id"]) |
There was a problem hiding this comment.
_get_any_run_id catches a broad Exception, which can hide real test failures (e.g., network issues, parsing errors) and make debugging harder. It would be more robust to only fall back to list() for the expected failure modes (e.g., an OpenML server “not found”/HTTP error for that specific run_id) and let unexpected exceptions fail the test.
geetu040
left a comment
There was a problem hiding this comment.
Please see #1619 (review) regarding changed cache path
geetu040
left a comment
There was a problem hiding this comment.
- are you sure
mainwas cleanly merged into this? I doubt that. - you haven't moved the
tests/filescorrectly, please see #1619 (comment) - since you opened the other PR as well, if any feedback is relevant from here for there or vice-versa, please apply
There was a problem hiding this comment.
merge was not clean here, right?
|
|
||
| def _parse_xml_response(self, payload: bytes | str, **kwargs: Any) -> Mapping[str, Any]: | ||
| try: | ||
| return cast("Mapping[str, Any]", xmltodict.parse(payload, **kwargs)) |
There was a problem hiding this comment.
is this casting really needed? I don't think so
| raise | ||
|
|
||
| xml_text = payload_text[xml_start:] | ||
| return cast("Mapping[str, Any]", xmltodict.parse(xml_text, **kwargs)) |
| run_id, | ||
| reset_cache=ignore_cache, | ||
| ), | ||
| ) |
There was a problem hiding this comment.
no need for explicit cast here
| runs_df = run_v1.list(limit=1, offset=0) | ||
| if runs_df.empty: | ||
| pytest.skip("No runs available on configured test server") | ||
| return int(runs_df.iloc[0]["run_id"]) |
There was a problem hiding this comment.
I think this function can be avoided, it's not clean. what's the problem with a fixed run id?
| @pytest.mark.test_server() | ||
| def test_run_v1_v2_contracts(run_v1, run_v2): | ||
| run_id = _get_any_run_id(run_v1) | ||
|
|
||
| run_from_v1 = run_v1.get(run_id=run_id) | ||
| _assert_run_shape(run_from_v1) | ||
|
|
||
| with pytest.raises(OpenMLNotSupportedError, match="does not support `get`"): | ||
| run_v2.get(run_id=run_id) | ||
|
|
||
| with pytest.raises(OpenMLNotSupportedError, match="does not support `list`"): | ||
| run_v2.list(limit=5, offset=0) | ||
|
|
||
| with pytest.raises(OpenMLNotSupportedError, match="does not support `publish`"): | ||
| run_v2.publish(path="run", files={"description": "<run/>"}) |
| self._test_local_evaluations(run) | ||
|
|
||
| @pytest.mark.sklearn() | ||
| @pytest.mark.skip("https://github.com/openml/openml-python/issues/1586") |
There was a problem hiding this comment.
but it's unrelated to this PR right? why remove the skip?
|
|
||
| self._test_local_evaluations(run) | ||
|
|
||
| @pytest.mark.skip(reason="https://github.com/openml/openml-python/issues/1586") |
geetu040
left a comment
There was a problem hiding this comment.
I've update #1619 (review). Please sync with main and update cache files accordingly.
| int | ||
| The ID of the published run. | ||
| """ | ||
| return super().publish(path=path, files=files) |
There was a problem hiding this comment.
no need to implement publish here in RunV1API
Metadata
Details
fixes #1624