Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 25 additions & 1 deletion 26 samples/snippets/classification_boosted_tree_model_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@


def test_boosted_tree_model(random_model_id: str) -> None:
# your_model_id = random_model_id
your_model_id = random_model_id
# [START bigquery_dataframes_bqml_boosted_tree_prepare]
import bigframes.pandas as bpd

Expand All @@ -39,4 +39,28 @@ def test_boosted_tree_model(random_model_id: str) -> None:
)
del input_data["functional_weight"]
# [END bigquery_dataframes_bqml_boosted_tree_prepare]
# [START bigquery_dataframes_bqml_boosted_tree_create]
from bigframes.ml import ensemble

# input_data is defined in an earlier step.
training_data = input_data[input_data["dataframe"] == "training"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No action needed, but something to consider for future: it would be nice to update the prepare section above to work without referencing an index (e.g. when ordering mode = "partial").

We have a few options, but the easiest will be to start with a string column and add (True, "training") as the last in the list of cases.

Aside: we have an issue open (349926559) to allow selecting any column in the dataframe (such as functional_weight, which would be a natural choice in this example) even if its a different type, so long as a True (default) case is provided.

X = training_data.drop(columns=["income_bracket", "dataframe"])
y = training_data["income_bracket"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably you ran this code sample and it worked OK? I remember we had some bugs where y had to be a DataFrame not a Series in past, so just double-checking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


# create and train the model
census_model = ensemble.XGBClassifier(
n_estimators=1,
booster="gbtree",
tree_method="hist",
max_iterations=1, # For a more accurate model, try 50 iterations.
subsample=0.85,
)
census_model.fit(X, y)

census_model.to_gbq(
your_model_id, # For example: "your-project.census.census_model"
replace=True,
)
# [END bigquery_dataframes_bqml_boosted_tree_create]
assert input_data is not None
assert census_model is not None
Morty Proxy This is a proxified and sanitized view of the page, visit original site.