Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[ENH] V1 → V2 API Migration - Tasks#1611

Open
satvshr wants to merge 393 commits intoopenml:mainopenml/openml-python:mainfrom
satvshr:taskssatvshr/openml-python:tasksCopy head branch name to clipboard
Open

[ENH] V1 → V2 API Migration - Tasks#1611
satvshr wants to merge 393 commits intoopenml:mainopenml/openml-python:mainfrom
satvshr:taskssatvshr/openml-python:tasksCopy head branch name to clipboard

Conversation

@satvshr
Copy link
Contributor

@satvshr satvshr commented Jan 9, 2026

Metadata

@codecov-commenter
Copy link

codecov-commenter commented Jan 9, 2026

Codecov Report

❌ Patch coverage is 13.37209% with 149 lines in your changes missing coverage. Please review.
✅ Project coverage is 27.30%. Comparing base (ec3ed63) to head (9a752eb).

Files with missing lines Patch % Lines
openml/_api/resources/task.py 10.63% 126 Missing ⚠️
openml/tasks/task.py 20.00% 12 Missing ⚠️
openml/tasks/functions.py 8.33% 11 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1611       +/-   ##
===========================================
- Coverage   54.71%   27.30%   -27.42%     
===========================================
  Files          61       61               
  Lines        5086     5102       +16     
===========================================
- Hits         2783     1393     -1390     
- Misses       2303     3709     +1406     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@geetu040 geetu040 mentioned this pull request Jan 9, 2026
18 tasks
@satvshr satvshr marked this pull request as ready for review January 12, 2026 15:29
Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a high-level review, I noticed a few points that need adjustment:

  • Caching can likely be removed from the SDK, since these concerns should be handled by the base client.
  • I don't see the api_context being used in tasks/functions, so it's not clear to me how the SDK is actually using the new API interface here.
  • Instead of moving entire methods out of tasks/functions.py, it would be better to stick to the goal of minimal SDK changes while enabling v2 support.
  • API calls should be updated at the specific root functions (for example _get_task_description, OpenMLTask._download_split).
  • For listing tasks, please follow the approach discussed in #1575 comment.

@satvshr satvshr marked this pull request as draft January 14, 2026 20:25
@satvshr satvshr changed the title [ENH] Tasks Migration [ENH] V1 → V2 API Migration - Tasks Jan 15, 2026
Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left some comments, please take a look and make sure the signature of all methods in TasksAPI, TasksV1 and TasksV2 stay same.

openml/_api/resources/base.py Outdated Show resolved Hide resolved
openml/_api/resources/base.py Outdated Show resolved Hide resolved
openml/_api/resources/tasks.py Outdated Show resolved Hide resolved
openml/_api/resources/tasks.py Outdated Show resolved Hide resolved
openml/_api/resources/tasks.py Outdated Show resolved Hide resolved
openml/_api/resources/tasks.py Outdated Show resolved Hide resolved
openml/_api/resources/tasks.py Outdated Show resolved Hide resolved
openml/_api/resources/tasks.py Outdated Show resolved Hide resolved
openml/_api/resources/tasks.py Outdated Show resolved Hide resolved
openml/tasks/functions.py Show resolved Hide resolved
@satvshr
Copy link
Contributor Author

satvshr commented Jan 16, 2026

@geetu040 I will explain the confusion I am facing to the best of my abilities, and I do feel communicating on the PR threads is yielding no results, hence I will just put everything here:

  1. Over here you say the base class should have v1 functions like list, delete, and create. This means to me that all V1 related functions should be moved from functions.py to TasksV1 (from SDK to resources) and calls inside functions.py should be replaced with calls to api_context.backend.task.method. Over here, you say something similar, stating that TasksV1, V2, and TasksAPI should have the same function signatures.
  2. Over here and here you mention that we should try to keep most of the code in SDK and not resources (contradictory to point 1).
    Completely contrary to point 1 you mention get_tasks along with create_task and delete_task should stay in sdk over here.

I seem to be getting 2 contradictory messages from your end which is where the confusion arises.

@geetu040
Copy link
Collaborator

I seem to be getting 2 contradictory messages from your end which is where the confusion arises.

I don't feel confused with the picture I have in my mind, I think I need to explain it even better. I'll try to document each method in detail what should be moved and what should stay.

@satvshr
Copy link
Contributor Author

satvshr commented Jan 19, 2026

Kindly resolve the comments which have been solved, it would be easier to navigate!

@geetu040
Copy link
Collaborator

geetu040 commented Jan 19, 2026

Kindly resolve the comments which have been solved, it would be easier to navigate!

It doesn't look like I have the typical "Resolve" button.

Edit: It's visible in the "Files changed" section, I'll resolve them there.

Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice, the code is in a better shape now. Many lines have been removed in the functions.py which seem fine but I'll take a deeper look. In other comments mostly some cleaning is required.

openml/tasks/task.py Outdated Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have an api call in _download_split, what do you think about that? also does it relate with the datasets PR?

Copy link
Contributor Author

@satvshr satvshr Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have an api call in _download_split, what do you think about that?

I remember you and Shrivaths discussing cache paths for V1 and V2 and I think the understanding was that the paths would differ, if that is the case, we should change this later on and leave it as it is right now.

also does it relate with the datasets PR?

Only the get_dataset part, hence the Shrivaths comment

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks really good and clean now, just one open thread which needs discussion now: #1611 (comment)

Copy link
Contributor Author

@satvshr satvshr Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you agreed with my approach, can we not close that too?

openml/_api/resources/base.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sdk code look good so far, please take a look at #1575 (comment) and make changes accordingly where needed.
all tests (existing and new) should pass to make sure we are retaining the original functionality of the sdk

task_type: TaskType | int | None = None,
**kwargs: Any,
) -> pd.DataFrame:
raise NotImplementedError("Task listing is not available in API v2 yet.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see point 8 in #1575 (comment)

return tasks


@openml.utils.thread_safe_if_oslo_installed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see point 3 in #1575 (comment)



class OpenMLTask(OpenMLBase):
class OpenMLTask(OpenMLBase, ResourceAPI):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not inherit ResourceAPI

source=str(split_url),
output_path=str(cache_file),
)
self._http.download(url=str(split_url), file_name="datasplits.arff")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use api_context.backend.task.[something] here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How am I supposed to do that? download is a method in the HTTPClient.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still this should be under a method now called openml._backend.task.[something] and then we can take this problem to TaskV1.[something].

I see this function depends on download of HTTPClient, I'll either add that method in the core PR or we can plan to merge datasets PR before this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be moving this to TaskAPI after discussion.

Comment on lines +592 to +245
raise NotImplementedError(
f"Task type ID {task_type:d} is not supported. "
f"Supported task type IDs: {TaskType.SUPERVISED_CLASSIFICATION.value},"
f"{TaskType.SUPERVISED_REGRESSION.value}, "
f"{TaskType.CLUSTERING.value}, {TaskType.LEARNING_CURVE.value}. "
f"Please refer to the TaskType enum for valid task type identifiers."
)
raise NotImplementedError(f"Task type {task_type:d} not supported.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep it as it was

Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update with #1576 (comment)

Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tests are failing because the cache path has changed

old: tests/files/org/openml/test/tasks/1882
new: tests/files/org/openml/test/api/v1/xml/task/1882

Let's discuss this on slack and see how we proceed with this change.

Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #1619 (review) regarding changed cache path

Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base PR is merged now, please sync with main.

@geetu040
Copy link
Collaborator

@satvshr the tests are failing because the cache is not in the expected format, see the first point in #1619 (review) and how it's done in #1619
so you need meta.json and headers.json which you can simply copy from the other pr and rename task.xml to body.bin

Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete the old files in tests/files so git identifies and marks them as renamed files which would make it easier to track

a lot of irrelevant files are touched in tests, probably from your other pr's pre-commit, please revert them

see the comments on _get_estimation_procedure_list
don't remove this function, keep it as it is, it's updated here #1604 (changes)
call it separately in list_tasks and pass onwards to openml._backend.task.list as a parameter
it should stay out of _api/
keep test__get_estimation_procedure_list

remove TaskV1API and TaskV2API import in openml/tasks/functions.py

encoding: str = "utf-8",
file_name: str = "response.txt",
md5_checksum: str | None = None,
) -> Path:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this used really, I cannot find?

Copy link
Contributor Author

@satvshr satvshr Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The place it was being used was removed by another PR, removing this

response = self._http.get(f"task/{task_id}", enable_cache=True)
return self._create_task_from_xml(response.text)

def _create_task_from_xml(self, xml: str) -> OpenMLTask:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method doesn't use self, make this a static method, apply this for other private methods as well.


return pd.DataFrame.from_dict(tasks, orient="index")

def _get_estimation_procedure_list(self) -> builtins.list[dict[str, Any]]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should stay out of _api/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though there are lot's of changes in this file but they are all clean, nicely done.

I see you are importing TaskV1API and TaskV2API, which is completely unacceptable and kills the purpose of migration. Please find a way to bypass this, move that functionality in the classes and access them through a common TaskAPI interface

raise OpenMLCacheException(f"Task file for tid {tid} not cached") from e


def _get_estimation_procedure_list() -> list[dict[str, Any]]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't remove this function, keep it as it is, it's updated here #1604 (changes)
call it separately in list_tasks and pass onwards to openml._backend.task.list as a parameter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will then have to alter the signature of openml._backend.task.list to accept _get_estimation_procedure_list as a permanent argument, which doesnt make sense?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would make sense, why not?

@pytest.mark.skipif(
os.getenv("OPENML_USE_LOCAL_SERVICES") == "true",
reason="Pending resolution of #1657",
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should skip these

reason="Pending resolution of #1657",
)
@pytest.mark.test_server()
def test_download_split(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as suggested in #1619 (review)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nicely done, do the same for tests/test_tasks/test_task_functions.py


@pytest.mark.test_server()
def test_get_train_and_test_split_indices(self):
import unittest.mock
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move import to the top

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would fail if you delete the old files in tests/files, you can keep it like the suggestion in #1619 (review)

Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated #1619 (review). Please sync with main and update cache files accordingly.

Tag to attach to the task.
"""
if self.task_id is None:
raise ValueError("Task does not have an ID. Please publish the task before tagging.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ObjectNotPublishedError as in _tag_openml_base

Comment on lines +279 to +281
raise ValueError(
"Dataset does not have an ID. Please publish the dataset before untagging."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is cache and api-calls which are needed to be updated in this file. I think you'd only need to update OpenMLTask.download_split. You should use _http.download here. Let me know if you need help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ENH] V1 → V2 API Migration - tasks

7 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.