Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Latest commit

 

History

History
History
110 lines (87 loc) · 4.32 KB

File metadata and controls

110 lines (87 loc) · 4.32 KB
Copy raw file
Download raw file
Open symbols panel
Edit and raw actions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
"""
=========
Run Setup
=========
By: Jan N. van Rijn
One of the key features of the openml-python library is that is allows to
reinstantiate flows with hyperparameter settings that were uploaded before.
This tutorial uses the concept of setups. Although setups are not extensively
described in the OpenML documentation (because most users will not directly
use them), they form a important concept within OpenML distinguishing between
hyperparameter configurations.
A setup is the combination of a flow with all its hyperparameters set.
A key requirement for reinstantiating a flow is to have the same scikit-learn
version as the flow that was uploaded. However, this tutorial will upload the
flow (that will later be reinstantiated) itself, so it can be ran with any
scikit-learn version that is supported by this library. In this case, the
requirement of the corresponding scikit-learn versions is automatically met.
In this tutorial we will
1) Create a flow and use it to solve a task;
2) Download the flow, reinstantiate the model with same hyperparameters,
and solve the same task again;
3) We will verify that the obtained results are exactly the same.
.. warning:: This example uploads data. For that reason, this example
connects to the test server at test.openml.org. This prevents the main
server from crowding with example datasets, tasks, runs, and so on.
"""
import logging
import numpy as np
import openml
import sklearn.ensemble
import sklearn.impute
import sklearn.preprocessing
root = logging.getLogger()
root.setLevel(logging.INFO)
openml.config.start_using_configuration_for_example()
###############################################################################
# 1) Create a flow and use it to solve a task
###############################################################################
# first, let's download the task that we are interested in
task = openml.tasks.get_task(6)
# we will create a fairly complex model, with many preprocessing components and
# many potential hyperparameters. Of course, the model can be as complex and as
# easy as you want it to be
model_original = sklearn.pipeline.make_pipeline(
sklearn.impute.SimpleImputer(),
sklearn.ensemble.RandomForestClassifier()
)
# Let's change some hyperparameters. Of course, in any good application we
# would tune them using, e.g., Random Search or Bayesian Optimization, but for
# the purpose of this tutorial we set them to some specific values that might
# or might not be optimal
hyperparameters_original = {
'simpleimputer__strategy': 'median',
'randomforestclassifier__criterion': 'entropy',
'randomforestclassifier__max_features': 0.2,
'randomforestclassifier__min_samples_leaf': 1,
'randomforestclassifier__n_estimators': 16,
'randomforestclassifier__random_state': 42,
}
model_original.set_params(**hyperparameters_original)
# solve the task and upload the result (this implicitly creates the flow)
run = openml.runs.run_model_on_task(
model_original,
task,
avoid_duplicate_runs=False)
run_original = run.publish() # this implicitly uploads the flow
###############################################################################
# 2) Download the flow and solve the same task again.
###############################################################################
# obtain setup id (note that the setup id is assigned by the OpenML server -
# therefore it was not yet available in our local copy of the run)
run_downloaded = openml.runs.get_run(run_original.run_id)
setup_id = run_downloaded.setup_id
# after this, we can easily reinstantiate the model
model_duplicate = openml.setups.initialize_model(setup_id)
# it will automatically have all the hyperparameters set
# and run the task again
run_duplicate = openml.runs.run_model_on_task(
model_duplicate, task, avoid_duplicate_runs=False)
###############################################################################
# 3) We will verify that the obtained results are exactly the same.
###############################################################################
# the run has stored all predictions in the field data content
np.testing.assert_array_equal(run_original.data_content,
run_duplicate.data_content)
###############################################################################
openml.config.stop_using_configuration_for_example()
Morty Proxy This is a proxified and sanitized view of the page, visit original site.