openml.datasets.OpenMLDataset¶Dataset object.
Allows fetching and uploading datasets to OpenML.
Name of the dataset.
Description of the dataset.
Format of the dataset which can be either ‘arff’ or ‘sparse_arff’.
Format for caching the dataset which can be either ‘feather’ or ‘pickle’.
Id autogenerated by the server.
Version of this dataset. ‘1’ for original version. Auto-incremented by server.
The person who created the dataset.
People who contributed to the current version of the dataset.
The date the data was originally collected, given by the uploader.
The date-time when the dataset was uploaded, generated by server.
Language in which the data is represented. Starts with 1 upper case letter, rest lower case, e.g. ‘English’.
License of the data.
Valid URL, points to actual data file. The file can be on the OpenML server or another dataset repository.
The default target attribute, if it exists. Can have multiple values, comma separated.
The attribute that represents the row-id column, if present in the dataset.
Attributes that should be excluded in modelling, such as identifiers and indexes.
Version label provided by user. Can be a date, hash, or some other type of id.
Reference(s) that should be cited when building on this data.
Tags, describing the algorithms.
Who can see the dataset. Typical values: ‘Everyone’,’All my friends’,’Only me’. Can also be any of the user’s circles.
For derived data, the url to the original dataset.
Link to a paper describing the dataset.
An explanation for when the dataset is uploaded.
MD5 checksum to check if the dataset is downloaded without corruption.
Path to where the dataset is located.
A dictionary of dataset features, which maps a feature index to a OpenMLDataFeature.
A dictionary of dataset qualities, which maps a quality name to a quality value.
Serialized arff dataset string.
URL to the MinIO bucket with dataset files
Path to the local parquet file.
Returns dataset content as dataframes or sparse matrices.
Name of target column to separate from the data. Splitting multiple columns is currently not supported.
Whether to include row ids in the returned dataset.
Whether to include columns that are marked as “ignore” on the server in the dataset.
The format of returned dataset.
If array, the returned dataset will be a NumPy array or a SciPy sparse
matrix. Support for array will be removed in 0.15.
If dataframe, the returned dataset will be a Pandas DataFrame.
Dataset
Target column
Mask that indicate categorical features.
List of attribute names.
Return indices of features of a given type, e.g. all nominal features. Optional parameters to exclude various features by index or ontology.
The data type to return (e.g., nominal, numeric, date, string)
are not present)
Whether to exclude the defined ignore attributes (and adapt the return values as if these indices are not present)
Whether to exclude the defined row id attributes (and adapt the return values as if these indices are not present)
a list of indices that have the specified data type
The id of the entity, it is unique for its entity type.
Opens the OpenML web page corresponding to this object in your default browser.
The URL of the object on the server, if it was uploaded, else None.
Annotates this entity with a tag on the server.
Tag to attach to the flow.
Removes a tag from this entity on the server.
Tag to attach to the flow.
Reads the datasets arff to determine the class-labels.
If the task has no class labels (for example a regression problem) it returns None. Necessary because the data returned by get_data only contains the indices of the classes, while OpenML needs the real classname when uploading the results of a run.
Name of the target attribute
Return the OpenML URL for the object of the class entity with the given id.