[ENH] V1 → V2 API Migration - Tasks#1611
[ENH] V1 → V2 API Migration - Tasks#1611satvshr wants to merge 393 commits intoopenml:mainopenml/openml-python:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1611 +/- ##
===========================================
- Coverage 54.71% 27.30% -27.42%
===========================================
Files 61 61
Lines 5086 5102 +16
===========================================
- Hits 2783 1393 -1390
- Misses 2303 3709 +1406 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
geetu040
left a comment
There was a problem hiding this comment.
From a high-level review, I noticed a few points that need adjustment:
- Caching can likely be removed from the SDK, since these concerns should be handled by the base client.
- I don't see the
api_contextbeing used intasks/functions, so it's not clear to me how the SDK is actually using the new API interface here. - Instead of moving entire methods out of
tasks/functions.py, it would be better to stick to the goal of minimal SDK changes while enabling v2 support. - API calls should be updated at the specific root functions (for example
_get_task_description,OpenMLTask._download_split). - For listing tasks, please follow the approach discussed in #1575 comment.
geetu040
left a comment
There was a problem hiding this comment.
I have left some comments, please take a look and make sure the signature of all methods in TasksAPI, TasksV1 and TasksV2 stay same.
|
@geetu040 I will explain the confusion I am facing to the best of my abilities, and I do feel communicating on the PR threads is yielding no results, hence I will just put everything here:
I seem to be getting 2 contradictory messages from your end which is where the confusion arises. |
I don't feel confused with the picture I have in my mind, I think I need to explain it even better. I'll try to document each method in detail what should be moved and what should stay. |
|
Kindly resolve the comments which have been solved, it would be easier to navigate! |
It doesn't look like I have the typical "Resolve" button. Edit: It's visible in the "Files changed" section, I'll resolve them there. |
geetu040
left a comment
There was a problem hiding this comment.
Really nice, the code is in a better shape now. Many lines have been removed in the functions.py which seem fine but I'll take a deeper look. In other comments mostly some cleaning is required.
There was a problem hiding this comment.
we have an api call in _download_split, what do you think about that? also does it relate with the datasets PR?
There was a problem hiding this comment.
we have an api call in
_download_split, what do you think about that?
I remember you and Shrivaths discussing cache paths for V1 and V2 and I think the understanding was that the paths would differ, if that is the case, we should change this later on and leave it as it is right now.
also does it relate with the datasets PR?
Only the get_dataset part, hence the Shrivaths comment
There was a problem hiding this comment.
this looks really good and clean now, just one open thread which needs discussion now: #1611 (comment)
There was a problem hiding this comment.
If you agreed with my approach, can we not close that too?
geetu040
left a comment
There was a problem hiding this comment.
sdk code look good so far, please take a look at #1575 (comment) and make changes accordingly where needed.
all tests (existing and new) should pass to make sure we are retaining the original functionality of the sdk
openml/_api/resources/tasks.py
Outdated
| task_type: TaskType | int | None = None, | ||
| **kwargs: Any, | ||
| ) -> pd.DataFrame: | ||
| raise NotImplementedError("Task listing is not available in API v2 yet.") |
| return tasks | ||
|
|
||
|
|
||
| @openml.utils.thread_safe_if_oslo_installed |
openml/tasks/task.py
Outdated
|
|
||
|
|
||
| class OpenMLTask(OpenMLBase): | ||
| class OpenMLTask(OpenMLBase, ResourceAPI): |
There was a problem hiding this comment.
should not inherit ResourceAPI
openml/tasks/task.py
Outdated
| source=str(split_url), | ||
| output_path=str(cache_file), | ||
| ) | ||
| self._http.download(url=str(split_url), file_name="datasplits.arff") |
There was a problem hiding this comment.
use api_context.backend.task.[something] here
There was a problem hiding this comment.
How am I supposed to do that? download is a method in the HTTPClient.
There was a problem hiding this comment.
Still this should be under a method now called openml._backend.task.[something] and then we can take this problem to TaskV1.[something].
I see this function depends on download of HTTPClient, I'll either add that method in the core PR or we can plan to merge datasets PR before this.
There was a problem hiding this comment.
Will be moving this to TaskAPI after discussion.
openml/tasks/functions.py
Outdated
| raise NotImplementedError( | ||
| f"Task type ID {task_type:d} is not supported. " | ||
| f"Supported task type IDs: {TaskType.SUPERVISED_CLASSIFICATION.value}," | ||
| f"{TaskType.SUPERVISED_REGRESSION.value}, " | ||
| f"{TaskType.CLUSTERING.value}, {TaskType.LEARNING_CURVE.value}. " | ||
| f"Please refer to the TaskType enum for valid task type identifiers." | ||
| ) | ||
| raise NotImplementedError(f"Task type {task_type:d} not supported.") |
This reverts commit fd43c48.
geetu040
left a comment
There was a problem hiding this comment.
update with #1576 (comment)
geetu040
left a comment
There was a problem hiding this comment.
the tests are failing because the cache path has changed
old: tests/files/org/openml/test/tasks/1882
new: tests/files/org/openml/test/api/v1/xml/task/1882
Let's discuss this on slack and see how we proceed with this change.
geetu040
left a comment
There was a problem hiding this comment.
Please see #1619 (review) regarding changed cache path
geetu040
left a comment
There was a problem hiding this comment.
The base PR is merged now, please sync with main.
|
@satvshr the tests are failing because the cache is not in the expected format, see the first point in #1619 (review) and how it's done in #1619 |
geetu040
left a comment
There was a problem hiding this comment.
delete the old files in tests/files so git identifies and marks them as renamed files which would make it easier to track
a lot of irrelevant files are touched in tests, probably from your other pr's pre-commit, please revert them
see the comments on _get_estimation_procedure_list
don't remove this function, keep it as it is, it's updated here #1604 (changes)
call it separately in list_tasks and pass onwards to openml._backend.task.list as a parameter
it should stay out of _api/
keep test__get_estimation_procedure_list
remove TaskV1API and TaskV2API import in openml/tasks/functions.py
| encoding: str = "utf-8", | ||
| file_name: str = "response.txt", | ||
| md5_checksum: str | None = None, | ||
| ) -> Path: |
There was a problem hiding this comment.
where is this used really, I cannot find?
There was a problem hiding this comment.
The place it was being used was removed by another PR, removing this
openml/_api/resources/task.py
Outdated
| response = self._http.get(f"task/{task_id}", enable_cache=True) | ||
| return self._create_task_from_xml(response.text) | ||
|
|
||
| def _create_task_from_xml(self, xml: str) -> OpenMLTask: |
There was a problem hiding this comment.
this method doesn't use self, make this a static method, apply this for other private methods as well.
openml/_api/resources/task.py
Outdated
|
|
||
| return pd.DataFrame.from_dict(tasks, orient="index") | ||
|
|
||
| def _get_estimation_procedure_list(self) -> builtins.list[dict[str, Any]]: |
There was a problem hiding this comment.
this should stay out of _api/
There was a problem hiding this comment.
though there are lot's of changes in this file but they are all clean, nicely done.
I see you are importing TaskV1API and TaskV2API, which is completely unacceptable and kills the purpose of migration. Please find a way to bypass this, move that functionality in the classes and access them through a common TaskAPI interface
| raise OpenMLCacheException(f"Task file for tid {tid} not cached") from e | ||
|
|
||
|
|
||
| def _get_estimation_procedure_list() -> list[dict[str, Any]]: |
There was a problem hiding this comment.
don't remove this function, keep it as it is, it's updated here #1604 (changes)
call it separately in list_tasks and pass onwards to openml._backend.task.list as a parameter
There was a problem hiding this comment.
We will then have to alter the signature of openml._backend.task.list to accept _get_estimation_procedure_list as a permanent argument, which doesnt make sense?
There was a problem hiding this comment.
I think that would make sense, why not?
| @pytest.mark.skipif( | ||
| os.getenv("OPENML_USE_LOCAL_SERVICES") == "true", | ||
| reason="Pending resolution of #1657", | ||
| ) |
There was a problem hiding this comment.
I don't think we should skip these
| reason="Pending resolution of #1657", | ||
| ) | ||
| @pytest.mark.test_server() | ||
| def test_download_split(self): |
There was a problem hiding this comment.
nicely done, do the same for tests/test_tasks/test_task_functions.py
|
|
||
| @pytest.mark.test_server() | ||
| def test_get_train_and_test_split_indices(self): | ||
| import unittest.mock |
There was a problem hiding this comment.
would fail if you delete the old files in tests/files, you can keep it like the suggestion in #1619 (review)
geetu040
left a comment
There was a problem hiding this comment.
I've updated #1619 (review). Please sync with main and update cache files accordingly.
openml/tasks/task.py
Outdated
| Tag to attach to the task. | ||
| """ | ||
| if self.task_id is None: | ||
| raise ValueError("Task does not have an ID. Please publish the task before tagging.") |
There was a problem hiding this comment.
use ObjectNotPublishedError as in _tag_openml_base
openml/tasks/task.py
Outdated
| raise ValueError( | ||
| "Dataset does not have an ID. Please publish the dataset before untagging." | ||
| ) |
There was a problem hiding this comment.
There is cache and api-calls which are needed to be updated in this file. I think you'd only need to update OpenMLTask.download_split. You should use _http.download here. Let me know if you need help.
Metadata