Open
Description
This a general question regarding the API of as_frame
and return_X_y
with our loader/fetcher. We have an inconsistent behaviour which could be solved straight away.
fetch_opeml
introduced as_frame
exposing frame
attribute in the Bunch
which is a Pandas DataFrame. In conjunction with return_X_y=True
, we exposed data
and target
. data
will always be a DataFrame
while target
is supposed to be a DataFrame
or a Series
depending on the number of columns in the target
.
X, y = fetch_openml('iris', as_frame=True, return_X_y=True)
y
0 Iris-setosa
1 Iris-setosa
2 Iris-setosa
3 Iris-setosa
4 Iris-setosa
...
145 Iris-virginica
146 Iris-virginica
147 Iris-virginica
148 Iris-virginica
149 Iris-virginica
Name: class, Length: 150, dtype: category
Categories (3, object): [Iris-setosa, Iris-versicolor, Iris-virginica]
In #15950, we introduce as_frame
to fetch_california_housing
. The API is the same apart of the target
output. The target
is a DataFrame
even with a single column.
X, y = fetch_california_housing(return_X_y=True, as_frame=True)
y
MedHouseVal
0 4.526
1 3.585
2 3.521
3 3.413
4 3.422
... ...
20635 0.781
20636 0.771
20637 0.923
20638 0.847
20639 0.894
[20640 rows x 1 columns]
So my question is: what type of target
do we want when the target is 1D?