Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Releases: openml/openml-python

v0.15.1

25 Jan 10:46
0ec2f85

Choose a tag to compare

Will clean up release notes later, highlights:

Thanks to everyone who contributed in any way 鉂わ笍

What's Changed

Full Changelog: v0.15.0...v0.15.1

v0.15.0

05 Oct 13:07
dea8724

Choose a tag to compare

What's Changed

  • ADD #1335: Improve MinIO support.
    • Add progress bar for downloading MinIO files. Enable it with setting show_progress to true on either openml.config or the configuration file.
    • When using download_all_files, files are only downloaded if they do not yet exist in the cache.
  • FIX #1338: Read the configuration file without overwriting it.
  • MAINT #1340: Add Numpy 2.0 support. Update tests to work with scikit-learn <= 1.5.
  • ADD #1342: Add HTTP header to requests to indicate they are from openml-python.
  • ADD #1345: task.get_dataset now takes the same parameters as openml.datasets.get_dataset to allow fine-grained control over file downloads.
  • MAINT #1346: The ARFF file of a dataset is now only downloaded if parquet is not available.
  • MAINT #1349: Removed usage of the disutils module, which allows for Py3.12 compatibility.
  • MAINT #1351: Image archives are now automatically deleted after they have been downloaded and extracted.
  • MAINT #1352, 1354: When fetching tasks and datasets, file download parameters now default to not downloading the file.
    Files will be downloaded only when a user tries to access properties which require them (e.g., dataset.qualities or dataset.get_data).

New Contributors

Full Changelog: v0.14.2...v0.15.0

Version 0.14.2

18 Jan 13:13
449f2cb

Choose a tag to compare

This is a minor release to support several hotfixes and technical debt.

  • MAINT #1280: Use the server-provided parquet_url instead of minio_url to determine the location of the parquet file.
  • ADD #716: add documentation for remaining attributes of classes and functions.
  • ADD #1261: more annotations for type hints.
  • MAINT #1294: update tests to new tag specification.
  • FIX #1314: Update fetching a bucket from MinIO.
  • FIX #1315: Make class label retrieval more lenient.
  • ADD #1316: add feature descriptions ontologies support.
  • MAINT #1310/#1307: switch to ruff and resolve all mypy errors.

Version 0.14

05 Jul 06:40
2791074

Choose a tag to compare

IMPORTANT: This release paves the way towards a breaking update of OpenML-Python. From version 0.15, functions that had the option to return a pandas DataFrame will return a pandas DataFrame by default. This version (0.14) emits a warning if you still use the old access functionality.

More concretely:

  • In 0.15 we will drop the ability to return dictionaries in listing calls and only provide pandas DataFrames. To disable warnings in 0.14 you have to request a pandas DataFrame (using output_format="dataframe").
  • In 0.15 we will drop the ability to return datasets as numpy arrays and only provide pandas DataFrames. To disable warnings in 0.14 you have to request a pandas DataFrame (using dataset_format="dataframe").

Furthermore, from version 0.15, OpenML-Python will no longer download datasets and dataset metadata by default. This version (0.14) emits a warning if you don't explicitly specify the desired behavior.

Please see the pull requests #1258 and #1260 for further information.

  • ADD #1081: New flag that allows disabling downloading dataset features.
  • ADD #1132: New flag that forces a redownload of cached data.
  • FIX #1244: Fixes a rare bug where task listing could fail when the server returned invalid data.
  • DOC #1229: Fixes a comment string for the main example.
  • DOC #1241: Fixes a comment in an example.
  • MAINT #1124: Improve naming of helper functions that govern the cache directories.
  • MAINT #1223, #1250: Update tools used in pre-commit to the latest versions (black==23.30, mypy==1.3.0, flake8==6.0.0).
  • MAINT #1253: Update the citation request to the JMLR paper.
  • MAINT #1246: Add a warning that warns the user that checking for duplicate runs on the server cannot be done without an API key.

Version 0.13.1

22 Mar 15:27
3380bbb

Choose a tag to compare

  • ADD #1028: Add functions to delete runs, flows, datasets, and tasks (e.g., openml.datasets.delete_dataset).
  • ADD #1144: Add locally computed results to the OpenMLRun object鈥檚 representation if the run was created locally and not downloaded from the server.
  • ADD #1180: Improve the error message when the checksum of a downloaded dataset does not match the checksum provided by the API.
  • ADD #1201: Make OpenMLTraceIteration a dataclass.
  • DOC #1069: Add argument documentation for the OpenMLRun class.
  • FIX #1197 #559 #1131: Fix the order of ground truth and predictions in the OpenMLRun object and in format_prediction.
  • FIX #1198: Support numpy 1.24 and higher.
  • FIX #1216: Allow unknown task types on the server. This is only relevant when new task types are added to the test server.
  • MAINT #1155: Add dependabot github action to automatically update other github actions.
  • MAINT #1199: Obtain pre-commit鈥檚 flake8 from github.com instead of gitlab.com.
  • MAINT #1215: Support latest numpy version.
  • MAINT #1218: Test Python3.6 on Ubuntu 20.04 instead of the latest Ubuntu (which is 22.04).
  • MAINT #1221 #1212 #1206 #1211: Update github actions to the latest versions.

Version 0.13.0

23 Feb 16:02
5eb84ce

Choose a tag to compare

Version 0.13.0

  • FIX #1030: pre-commit hooks now no longer should issue a warning.
  • FIX #1058, #1100: Avoid NoneType error when printing task without class_labels attribute.
  • FIX #1110: Make arguments to create_study and create_suite that are defined as optional by the OpenML XSD actually optional.
  • FIX #1147: openml.flow.flow_exists no longer requires an API key.
  • FIX #1184: Automatically resolve proxies when downloading from minio. Turn this off by setting environment variable no_proxy="*".
  • MAIN #1088: Do CI for Windows on Github Actions instead of Appveyor.
  • MAINT #1104: Fix outdated docstring for list_task.
  • MAIN #1146: Update the pre-commit dependencies.
  • ADD #1103: Add a predictions property to OpenMLRun for easy accessibility of prediction data.
  • ADD #1188: EXPERIMENTAL. Allow downloading all files from a minio bucket with download_all_files=True for get_dataset.

Version 0.12.1

14 Apr 08:52
72576bd

Choose a tag to compare

Version 0.12.1

  • ADD #895/#1038: Measure runtimes of scikit-learn runs also for models which are parallelized via the joblib.
  • DOC #1050: Refer to the webpage instead of the XML file in the main example.
  • DOC #1051: Document existing extensions to OpenML-Python besides the shipped scikit-learn extension.
  • FIX #1035: Render class attributes and methods again.
  • FIX #1042: Fixes a rare concurrency issue with OpenML-Python and joblib which caused the joblib worker pool to fail.
  • FIX #1053: Fixes a bug which could prevent importing the package in a docker container.

Version 0.12.0

08 Apr 16:48
5511fa0

Choose a tag to compare

0.11.1

  • ADD #964: Validate ignore_attribute, default_target_attribute, row_id_attribute are set to attributes that exist on the dataset when calling create_dataset.
  • ADD #979: Dataset features and qualities are now also cached in pickle format.
  • ADD #982: Add helper functions for column transformers.
  • ADD #989: run_model_on_task will now warn the user the the model passed has already been fitted.
  • ADD #1009 : Give possibility to not download the dataset qualities. The cached version is used even so download attribute is false.
  • ADD #1016: Add scikit-learn 0.24 support.
  • ADD #1020: Add option to parallelize evaluation of tasks with joblib.
  • ADD #1022: Allow minimum version of dependencies to be listed for a flow, use more accurate minimum versions for scikit-learn dependencies.
  • ADD #1023: Add admin-only calls for adding topics to datasets.
  • ADD #1029: Add support for fetching dataset from a minio server in parquet format.
  • ADD #1031: Generally improve runtime measurements, add them for some previously unsupported flows (e.g. BaseSearchCV derived flows).
  • DOC #973 : Change the task used in the welcome page example so it no longer fails using numerical dataset.
  • MAINT #671: Improved the performance of check_datasets_active by only querying the given list of datasets in contrast to querying all datasets. Modified the corresponding unit test.
  • MAINT #891: Changed the way that numerical features are stored. Numerical features that range from 0 to 255 are now stored as uint8, which reduces the storage space required as well as storing and loading times.
  • MAINT #975, #988: Add CI through Github Actions.
  • MAINT #977: Allow short and long scenarios for unit tests. Reduce the workload for some unit tests.
  • MAINT #985, #1000: Improve unit test stability and output readability, and adds load balancing.
  • MAINT #1018: Refactor data loading and storage. Data is now compressed on the first call to get_data.
  • MAINT #1024: Remove flaky decorator for study unit test.
  • FIX #883 #884 #906 #972: Various improvements to the caching system.
  • FIX #980: Speed up check_datasets_active.
  • FIX #984: Add a retry mechanism when the server encounters a database issue.
  • FIX #1004: Fixed an issue that prevented installation on some systems (e.g. Ubuntu).
  • FIX #1013: Fixes a bug where OpenMLRun.setup_string was not uploaded to the server, prepares for run_details being sent from the server.
  • FIX #1021: Fixes an issue that could occur when running unit tests and openml-python was not in PATH.
  • FIX #1037: Fixes a bug where a dataset could not be loaded if a categorical value had listed nan-like as a possible category.

Version 0.11.0

25 Oct 19:19
bc87333

Choose a tag to compare

  • ADD #753: Allows uploading custom flows to OpenML via OpenML-Python.
  • ADD #777: Allows running a flow on pandas dataframes (in addition to numpy arrays).
  • ADD #888: Allow passing a task_id to run_model_on_task.
  • ADD #894: Support caching of datasets using feather format as an option.
  • ADD #929: Add edit_dataset and fork_dataset to allow editing and forking of uploaded datasets.
  • ADD #866, #943: Add support for scikit-learn's passthrough and drop when uploading flows to OpenML.
  • ADD #879: Add support for scikit-learn's MLP hyperparameter layer_sizes.
  • ADD #894: Support caching of datasets using feather format as an option.
  • ADD #945: PEP 561 compliance for distributing Type information.
  • DOC #660: Remove nonexistent argument from docstring.
  • DOC #901: The API reference now documents the config file and its options.
  • DOC #912: API reference now shows create_task.
  • DOC #954: Remove TODO text from documentation.
  • DOC #960: document how to upload multiple ignore attributes.
  • FIX #873: Fixes an issue which resulted in incorrect URLs when printing OpenML objects after switching the server.
  • FIX #885: Logger no longer registered by default. Added utility functions to easily register logging to console and file.
  • FIX #890: Correct the scaling of data in the SVM example.
  • MAINT #371: list_evaluations default size changed from None to 10_000.
  • MAINT #767: Source distribution installation is now unit-tested.
  • MAINT #781: Add pre-commit and automated code formatting with black.
  • MAINT #804: Rename arguments of list_evaluations to indicate they expect lists of ids.
  • MAINT #836: OpenML supports only pandas version 1.0.0 or above.
  • MAINT #865: OpenML no longer bundles test files in the source distribution.
  • MAINT #881: Improve the error message for too-long URIs.
  • MAINT #897: Dropping support for Python 3.5.
  • MAINT #916: Adding support for Python 3.8.
  • MAINT #920: Improve error messages for dataset upload.
  • MAINT #921: Improve hangling of the OpenML server URL in the config file.
  • MAINT #925: Improve error handling and error message when loading datasets.
  • MAINT #928: Restructures the contributing documentation.
  • MAINT #936: Adding support for scikit-learn 0.23.X.
  • MAINT #945: Make OpenML-Python PEP562 compliant.
  • MAINT #951: Converts TaskType class to a TaskType enum.

Version 0.10.2

07 Nov 14:49

Choose a tag to compare

  • ADD #857: Adds task type ID to list_runs
  • DOC #862: Added license BSD 3-Clause to each of the source files.
Morty Proxy This is a proxified and sanitized view of the page, visit original site.