Version 0.13.0

Will clean up release notes later, highlights:

Fix usage of environment variables for locating the default cache and configuration directories by @eddiebergman in #1359
Allow skip trying to download parquet files by setting the OPENML_SKIP_PARQUET variable to true by @PGijsbers in #1388
a lot of maintenance work by @eddiebergman and @LennartPurucker

Thanks to everyone who contributed in any way ❤️

What's Changed

[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #1329
Bump codecov/codecov-action from 3 to 4 by @dependabot in #1328
Disable docker release on PR by @LennartPurucker in #1360
fix(datasets): Add code 111 for dataset description not found error by @eddiebergman in #1356
Test Fixes for v0.15.1 by @LennartPurucker in #1358
fix: Avoid Random State and Other Test Bug by @LennartPurucker in #1362
fix/maint: Make Docs Work Again and Stop Progress.rst Usage by @LennartPurucker in #1365
doc: README Rework by @LennartPurucker in #1361
doc: make all examples use names instead of IDs as reference. by @LennartPurucker in #1367
fix: avoid stripping whitespaces for feature names by @LennartPurucker in #1368
fix: workaround for git test workflow for Python 3.8 by @LennartPurucker in #1369
add: test for dataset comparison and ignore fields by @LennartPurucker in #1370
fix: github workflows and pytest issue by @LennartPurucker in #1373
feat: support for loose init model from run by @LennartPurucker in #1371
fix/maint: avoid exit code (which kills the docs building) by @LennartPurucker in #1374
ux: Provide helpful link to documentation when error due to missing API token by @eddiebergman in #1364
ci: Docker/build-push-action from 5 to 6 by @dependabot in #1357
ci: Bumb peter-evans/dockerhub-description from 3 to 4 by @dependabot in #1326
fix: resolve Sphinx style error by @LennartPurucker in #1375
docs: fix borken links after openml.org rework by @LennartPurucker in #1376
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #1380
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #1381
Mark test as production by @PGijsbers in #1384
Patch release bump by @PGijsbers in #1389

Full Changelog: v0.15.0...v0.15.1

What's Changed

ADD #1335: Improve MinIO support.
- Add progress bar for downloading MinIO files. Enable it with setting show_progress to true on either openml.config or the configuration file.
- When using download_all_files, files are only downloaded if they do not yet exist in the cache.
FIX #1338: Read the configuration file without overwriting it.
MAINT #1340: Add Numpy 2.0 support. Update tests to work with scikit-learn <= 1.5.
ADD #1342: Add HTTP header to requests to indicate they are from openml-python.
ADD #1345: task.get_dataset now takes the same parameters as openml.datasets.get_dataset to allow fine-grained control over file downloads.
MAINT #1346: The ARFF file of a dataset is now only downloaded if parquet is not available.
MAINT #1349: Removed usage of the disutils module, which allows for Py3.12 compatibility.
MAINT #1351: Image archives are now automatically deleted after they have been downloaded and extracted.
MAINT #1352, 1354: When fetching tasks and datasets, file download parameters now default to not downloading the file.
Files will be downloaded only when a user tries to access properties which require them (e.g., dataset.qualities or dataset.get_data).

New Contributors

@BrunoBelucci made their first contribution in #1338
@knyazer made their first contribution in #1345

Full Changelog: v0.14.2...v0.15.0

This is a minor release to support several hotfixes and technical debt.

MAINT #1280: Use the server-provided parquet_url instead of minio_url to determine the location of the parquet file.
ADD #716: add documentation for remaining attributes of classes and functions.
ADD #1261: more annotations for type hints.
MAINT #1294: update tests to new tag specification.
FIX #1314: Update fetching a bucket from MinIO.
FIX #1315: Make class label retrieval more lenient.
ADD #1316: add feature descriptions ontologies support.
MAINT #1310/#1307: switch to ruff and resolve all mypy errors.

IMPORTANT: This release paves the way towards a breaking update of OpenML-Python. From version 0.15, functions that had the option to return a pandas DataFrame will return a pandas DataFrame by default. This version (0.14) emits a warning if you still use the old access functionality.

More concretely:

In 0.15 we will drop the ability to return dictionaries in listing calls and only provide pandas DataFrames. To disable warnings in 0.14 you have to request a pandas DataFrame (using output_format="dataframe").
In 0.15 we will drop the ability to return datasets as numpy arrays and only provide pandas DataFrames. To disable warnings in 0.14 you have to request a pandas DataFrame (using dataset_format="dataframe").

Furthermore, from version 0.15, OpenML-Python will no longer download datasets and dataset metadata by default. This version (0.14) emits a warning if you don't explicitly specify the desired behavior.

Please see the pull requests #1258 and #1260 for further information.

ADD #1081: New flag that allows disabling downloading dataset features.
ADD #1132: New flag that forces a redownload of cached data.
FIX #1244: Fixes a rare bug where task listing could fail when the server returned invalid data.
DOC #1229: Fixes a comment string for the main example.
DOC #1241: Fixes a comment in an example.
MAINT #1124: Improve naming of helper functions that govern the cache directories.
MAINT #1223, #1250: Update tools used in pre-commit to the latest versions (black==23.30, mypy==1.3.0, flake8==6.0.0).
MAINT #1253: Update the citation request to the JMLR paper.
MAINT #1246: Add a warning that warns the user that checking for duplicate runs on the server cannot be done without an API key.

ADD #1028: Add functions to delete runs, flows, datasets, and tasks (e.g., openml.datasets.delete_dataset).
ADD #1144: Add locally computed results to the OpenMLRun object’s representation if the run was created locally and not downloaded from the server.
ADD #1180: Improve the error message when the checksum of a downloaded dataset does not match the checksum provided by the API.
ADD #1201: Make OpenMLTraceIteration a dataclass.
DOC #1069: Add argument documentation for the OpenMLRun class.
FIX #1197 #559 #1131: Fix the order of ground truth and predictions in the OpenMLRun object and in format_prediction.
FIX #1198: Support numpy 1.24 and higher.
FIX #1216: Allow unknown task types on the server. This is only relevant when new task types are added to the test server.
MAINT #1155: Add dependabot github action to automatically update other github actions.
MAINT #1199: Obtain pre-commit’s flake8 from github.com instead of gitlab.com.
MAINT #1215: Support latest numpy version.
MAINT #1218: Test Python3.6 on Ubuntu 20.04 instead of the latest Ubuntu (which is 22.04).
MAINT #1221 #1212 #1206 #1211: Update github actions to the latest versions.

Version 0.13.0

FIX #1030: pre-commit hooks now no longer should issue a warning.
FIX #1058, #1100: Avoid NoneType error when printing task without class_labels attribute.
FIX #1110: Make arguments to create_study and create_suite that are defined as optional by the OpenML XSD actually optional.
FIX #1147: openml.flow.flow_exists no longer requires an API key.
FIX #1184: Automatically resolve proxies when downloading from minio. Turn this off by setting environment variable no_proxy="*".
MAIN #1088: Do CI for Windows on Github Actions instead of Appveyor.
MAINT #1104: Fix outdated docstring for list_task.
MAIN #1146: Update the pre-commit dependencies.
ADD #1103: Add a predictions property to OpenMLRun for easy accessibility of prediction data.
ADD #1188: EXPERIMENTAL. Allow downloading all files from a minio bucket with download_all_files=True for get_dataset.

Version 0.12.1

ADD #895/#1038: Measure runtimes of scikit-learn runs also for models which are parallelized via the joblib.
DOC #1050: Refer to the webpage instead of the XML file in the main example.
DOC #1051: Document existing extensions to OpenML-Python besides the shipped scikit-learn extension.
FIX #1035: Render class attributes and methods again.
FIX #1042: Fixes a rare concurrency issue with OpenML-Python and joblib which caused the joblib worker pool to fail.
FIX #1053: Fixes a bug which could prevent importing the package in a docker container.

0.11.1

ADD #964: Validate ignore_attribute, default_target_attribute, row_id_attribute are set to attributes that exist on the dataset when calling create_dataset.
ADD #979: Dataset features and qualities are now also cached in pickle format.
ADD #982: Add helper functions for column transformers.
ADD #989: run_model_on_task will now warn the user the the model passed has already been fitted.
ADD #1009 : Give possibility to not download the dataset qualities. The cached version is used even so download attribute is false.
ADD #1016: Add scikit-learn 0.24 support.
ADD #1020: Add option to parallelize evaluation of tasks with joblib.
ADD #1022: Allow minimum version of dependencies to be listed for a flow, use more accurate minimum versions for scikit-learn dependencies.
ADD #1023: Add admin-only calls for adding topics to datasets.
ADD #1029: Add support for fetching dataset from a minio server in parquet format.
ADD #1031: Generally improve runtime measurements, add them for some previously unsupported flows (e.g. BaseSearchCV derived flows).
DOC #973 : Change the task used in the welcome page example so it no longer fails using numerical dataset.
MAINT #671: Improved the performance of check_datasets_active by only querying the given list of datasets in contrast to querying all datasets. Modified the corresponding unit test.
MAINT #891: Changed the way that numerical features are stored. Numerical features that range from 0 to 255 are now stored as uint8, which reduces the storage space required as well as storing and loading times.
MAINT #975, #988: Add CI through Github Actions.
MAINT #977: Allow short and long scenarios for unit tests. Reduce the workload for some unit tests.
MAINT #985, #1000: Improve unit test stability and output readability, and adds load balancing.
MAINT #1018: Refactor data loading and storage. Data is now compressed on the first call to get_data.
MAINT #1024: Remove flaky decorator for study unit test.
FIX #883 #884 #906 #972: Various improvements to the caching system.
FIX #980: Speed up check_datasets_active.
FIX #984: Add a retry mechanism when the server encounters a database issue.
FIX #1004: Fixed an issue that prevented installation on some systems (e.g. Ubuntu).
FIX #1013: Fixes a bug where OpenMLRun.setup_string was not uploaded to the server, prepares for run_details being sent from the server.
FIX #1021: Fixes an issue that could occur when running unit tests and openml-python was not in PATH.
FIX #1037: Fixes a bug where a dataset could not be loaded if a categorical value had listed nan-like as a possible category.

ADD #753: Allows uploading custom flows to OpenML via OpenML-Python.
ADD #777: Allows running a flow on pandas dataframes (in addition to numpy arrays).
ADD #888: Allow passing a task_id to run_model_on_task.
ADD #894: Support caching of datasets using feather format as an option.
ADD #929: Add edit_dataset and fork_dataset to allow editing and forking of uploaded datasets.
ADD #866, #943: Add support for scikit-learn's passthrough and drop when uploading flows to OpenML.
ADD #879: Add support for scikit-learn's MLP hyperparameter layer_sizes.
ADD #894: Support caching of datasets using feather format as an option.
ADD #945: PEP 561 compliance for distributing Type information.
DOC #660: Remove nonexistent argument from docstring.
DOC #901: The API reference now documents the config file and its options.
DOC #912: API reference now shows create_task.
DOC #954: Remove TODO text from documentation.
DOC #960: document how to upload multiple ignore attributes.
FIX #873: Fixes an issue which resulted in incorrect URLs when printing OpenML objects after switching the server.
FIX #885: Logger no longer registered by default. Added utility functions to easily register logging to console and file.
FIX #890: Correct the scaling of data in the SVM example.
MAINT #371: list_evaluations default size changed from None to 10_000.
MAINT #767: Source distribution installation is now unit-tested.
MAINT #781: Add pre-commit and automated code formatting with black.
MAINT #804: Rename arguments of list_evaluations to indicate they expect lists of ids.
MAINT #836: OpenML supports only pandas version 1.0.0 or above.
MAINT #865: OpenML no longer bundles test files in the source distribution.
MAINT #881: Improve the error message for too-long URIs.
MAINT #897: Dropping support for Python 3.5.
MAINT #916: Adding support for Python 3.8.
MAINT #920: Improve error messages for dataset upload.
MAINT #921: Improve hangling of the OpenML server URL in the config file.
MAINT #925: Improve error handling and error message when loading datasets.
MAINT #928: Restructures the contributing documentation.
MAINT #936: Adding support for scikit-learn 0.23.X.
MAINT #945: Make OpenML-Python PEP562 compliant.
MAINT #951: Converts TaskType class to a TaskType enum.

ADD #857: Adds task type ID to list_runs
DOC #862: Added license BSD 3-Clause to each of the source files.

Search code, repositories, users, issues, pull requests...

Uh oh!

Releases: openml/openml-python

v0.15.1

What's Changed

Contributors

Uh oh!

v0.15.0

What's Changed

New Contributors

Contributors

Uh oh!

Version 0.14.2

Uh oh!

Version 0.14

Uh oh!

Version 0.13.1

Uh oh!

Version 0.13.0

Version 0.13.0

Uh oh!

Version 0.12.1

Version 0.12.1

Uh oh!

Version 0.12.0

Uh oh!

Version 0.11.0

Uh oh!

Version 0.10.2

Uh oh!