Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 083c3e1

Browse filesBrowse files
busunkim96sirtorrymerla18emmbylwander
authored
docs: add samples from tables/automl (#54)
* Tables Notebooks [(#2090)](GoogleCloudPlatform/python-docs-samples#2090) * initial commit * update census * update notebooks * remove the reference to a bug [(#2100)](GoogleCloudPlatform/python-docs-samples#2100) as the bug has been fixed in the public client lib * delete this file. [(#2102)](GoogleCloudPlatform/python-docs-samples#2102) * rename file name [(#2103)](GoogleCloudPlatform/python-docs-samples#2103) * trying to fix images [(#2101)](GoogleCloudPlatform/python-docs-samples#2101) * remove typo in installation [(#2110)](GoogleCloudPlatform/python-docs-samples#2110) * Rename census_income_prediction.ipynb to getting_started_notebook.ipynb [(#2115)](GoogleCloudPlatform/python-docs-samples#2115) renaming the notebooks as Getting Started (will be in sync with the doc). It will be great if the folder could be renamed too * added back missing file package import [(#2150)](GoogleCloudPlatform/python-docs-samples#2150) * added back missing file import [(#2145)](GoogleCloudPlatform/python-docs-samples#2145) * remove incorrect reference to Iris dataset [(#2203)](GoogleCloudPlatform/python-docs-samples#2203) * conversion to jupyter/colab [(#2340)](GoogleCloudPlatform/python-docs-samples#2340) plus bug fixes * updated for the Jupyter support [(#2337)](GoogleCloudPlatform/python-docs-samples#2337) * updated readme for support Jupyter [(#2336)](GoogleCloudPlatform/python-docs-samples#2336) to approve with the updated notebook supporting jupyter * conversion to jupyer/colab [(#2339)](GoogleCloudPlatform/python-docs-samples#2339) plus bug fixes * conversion of notebook for jupyter/Colab [(#2338)](GoogleCloudPlatform/python-docs-samples#2338) conversion of the notebook to support both Jupyter and Colab + bug fixes * [BLOCKED] AutoML Tables: Docs samples updated to use new (pending) client [(#2276)](GoogleCloudPlatform/python-docs-samples#2276) * AutoML Tables: Docs samples updated to use new (pending) client * Linter warnings * add product recommendation for automl tables notebook [(#2257)](GoogleCloudPlatform/python-docs-samples#2257) * added colab filtering notebook * update to tables client * update readme * tell user to restart kernel for automl * AutoML Tables: Notebook samples updated to use new tables client [(#2424)](GoogleCloudPlatform/python-docs-samples#2424) * fix users bug and emphasize kernal restart [(#2407)](GoogleCloudPlatform/python-docs-samples#2407) * fix problems with automl docs [(#2501)](GoogleCloudPlatform/python-docs-samples#2501) Today when we try to use the function `batch_predict` follow the docs we receive and error saying: `the paramaters should be a pandas.Dataframe` it’s happens because the first parameter of the function `batch_predict` is a pandas.Dataframe. To solve this problem we need to use de named parameters of python. * Fix typo in GCS URI parameter [(#2459)](GoogleCloudPlatform/python-docs-samples#2459) * fix: fix tables notebook links and bugs [(#2601)](GoogleCloudPlatform/python-docs-samples#2601) * feat(tables): update samples to show explainability [(#2523)](GoogleCloudPlatform/python-docs-samples#2523) * show xai * local feature importance * use updated client * use fixed library * use new model * Auto-update dependencies. [(#2005)](GoogleCloudPlatform/python-docs-samples#2005) * Auto-update dependencies. * Revert update of appengine/flexible/datastore. * revert update of appengine/flexible/scipy * revert update of bigquery/bqml * revert update of bigquery/cloud-client * revert update of bigquery/datalab-migration * revert update of bigtable/quickstart * revert update of compute/api * revert update of container_registry/container_analysis * revert update of dataflow/run_template * revert update of datastore/cloud-ndb * revert update of dialogflow/cloud-client * revert update of dlp * revert update of functions/imagemagick * revert update of functions/ocr/app * revert update of healthcare/api-client/fhir * revert update of iam/api-client * revert update of iot/api-client/gcs_file_to_device * revert update of iot/api-client/mqtt_example * revert update of language/automl * revert update of run/image-processing * revert update of vision/automl * revert update testing/requirements.txt * revert update of vision/cloud-client/detect * revert update of vision/cloud-client/product_search * revert update of jobs/v2/api_client * revert update of jobs/v3/api_client * revert update of opencensus * revert update of translate/cloud-client * revert update to speech/cloud-client Co-authored-by: Kurtis Van Gent <31518063+kurtisvg@users.noreply.github.com> Co-authored-by: Doug Mahugh <dmahugh@gmail.com> * Update dependency google-cloud-automl to v0.10.0 [(#3033)](GoogleCloudPlatform/python-docs-samples#3033) Co-authored-by: Bu Sun Kim <8822365+busunkim96@users.noreply.github.com> Co-authored-by: Leah E. Cole <6719667+leahecole@users.noreply.github.com> * Simplify noxfile setup. [(#2806)](GoogleCloudPlatform/python-docs-samples#2806) * chore(deps): update dependency requests to v2.23.0 * Simplify noxfile and add version control. * Configure appengine/standard to only test Python 2.7. * Update Kokokro configs to match noxfile. * Add requirements-test to each folder. * Remove Py2 versions from everything execept appengine/standard. * Remove conftest.py. * Remove appengine/standard/conftest.py * Remove 'no-sucess-flaky-report' from pytest.ini. * Add GAE SDK back to appengine/standard tests. * Fix typo. * Roll pytest to python 2 version. * Add a bunch of testing requirements. * Remove typo. * Add appengine lib directory back in. * Add some additional requirements. * Fix issue with flake8 args. * Even more requirements. * Readd appengine conftest.py. * Add a few more requirements. * Even more Appengine requirements. * Add webtest for appengine/standard/mailgun. * Add some additional requirements. * Add workaround for issue with mailjet-rest. * Add responses for appengine/standard/mailjet. Co-authored-by: Renovate Bot <bot@renovateapp.com> * chore: some lint fixes [(#3750)](GoogleCloudPlatform/python-docs-samples#3750) * automl: tables code sample clean-up [(#3571)](GoogleCloudPlatform/python-docs-samples#3571) * delete unused tables_dataset samples * delete args code associated with unused automl_tables samples * delete tests associated with unused automl_tables samples * restore get_dataset method/yargs without region tagging * Restore update_dataset methodsa without region tagging Co-authored-by: Takashi Matsuo <tmatsuo@google.com> Co-authored-by: Leah E. Cole <6719667+leahecole@users.noreply.github.com> * add example of creating AutoML Tables client with non-default endpoint ('new' sdk) [(#3929)](GoogleCloudPlatform/python-docs-samples#3929) * add example of creating client with non-default endpoint * more test file cleanup * move connectivity print stmt out of test fn Co-authored-by: Leah E. Cole <6719667+leahecole@users.noreply.github.com> Co-authored-by: Torry Yang <sirtorry@users.noreply.github.com> * Replace GCLOUD_PROJECT with GOOGLE_CLOUD_PROJECT. [(#4022)](GoogleCloudPlatform/python-docs-samples#4022) * chore(deps): update dependency google-cloud-automl to v1 [(#4127)](GoogleCloudPlatform/python-docs-samples#4127) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [google-cloud-automl](https://togithub.com/googleapis/python-automl) | major | `==0.10.0` -> `==1.0.1` | --- ### Release Notes <details> <summary>googleapis/python-automl</summary> ### [`v1.0.1`](https://togithub.com/googleapis/python-automl/blob/master/CHANGELOG.md#&#8203;101-httpswwwgithubcomgoogleapispython-automlcomparev100v101-2020-06-18) [Compare Source](https://togithub.com/googleapis/python-automl/compare/v0.10.0...v1.0.1) </details> --- ### Renovate configuration :date: **Schedule**: At any time (no schedule defined). :vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied. :recycle: **Rebasing**: Never, or you tick the rebase/retry checkbox. :no_bell: **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#GoogleCloudPlatform/python-docs-samples). * [tables/automl] fix: update the csv file and the dataset name [(#4188)](GoogleCloudPlatform/python-docs-samples#4188) fixes #4177 fixes #4178 * samples: Automl table batch test [(#4267)](GoogleCloudPlatform/python-docs-samples#4267) * added rtest req.txt * samples: added automl batch predict test * added missing package * Update tables/automl/batch_predict_test.py Co-authored-by: Bu Sun Kim <8822365+busunkim96@users.noreply.github.com> Co-authored-by: Bu Sun Kim <8822365+busunkim96@users.noreply.github.com> * samples: fixed wrong format on GCS input Uri [(#4270)](GoogleCloudPlatform/python-docs-samples#4270) ## Description Current predict sample indicates that it can multiples GCS URI inputs but it should be singular. ## Checklist - [X] Please **merge** this PR for me once it is approved. * chore(deps): update dependency pytest to v5.4.3 [(#4279)](GoogleCloudPlatform/python-docs-samples#4279) * chore(deps): update dependency pytest to v5.4.3 * specify pytest for python 2 in appengine Co-authored-by: Leah Cole <coleleah@google.com> * Update automl_tables_predict.py with batch_predict_bq sample [(#4142)](GoogleCloudPlatform/python-docs-samples#4142) Added a new method `batch_predict_bq` demonstrating running batch_prediction using BigQuery. Added notes in comments about asynchronicity for `batch_predict` method. The region `automl_tables_batch_predict_bq` will be used on cloud.google.com (currently both sections for GCS and BigQuery use the same sample code which is incorrect). Fixes #4141 Note: It's a good idea to open an issue first for discussion. - [x] Please **merge** this PR for me once it is approved. * Update dependency pytest to v6 [(#4390)](GoogleCloudPlatform/python-docs-samples#4390) * chore: exclude notebooks * chore: update templates * chore: add codeowners and fix tests * chore: ignore warnings from sphinx * chore: fix tables client * test: fix unit tests Co-authored-by: Torry Yang <sirtorry@users.noreply.github.com> Co-authored-by: florencep <florenceperot@google.com> Co-authored-by: Mike Burton <mb-github@niskala.org> Co-authored-by: Lars Wander <lwander@users.noreply.github.com> Co-authored-by: Michael Hu <Michael.an.hu@gmail.com> Co-authored-by: Michael Hu <michaelanhu@gmail.com> Co-authored-by: Alefh Sousa <alefh.sousa@gmail.com> Co-authored-by: DPEBot <dpebot@google.com> Co-authored-by: Kurtis Van Gent <31518063+kurtisvg@users.noreply.github.com> Co-authored-by: Doug Mahugh <dmahugh@gmail.com> Co-authored-by: WhiteSource Renovate <bot@renovateapp.com> Co-authored-by: Leah E. Cole <6719667+leahecole@users.noreply.github.com> Co-authored-by: Takashi Matsuo <tmatsuo@google.com> Co-authored-by: Anthony <wens.ajw@gmail.com> Co-authored-by: Amy <amy@infosleuth.net> Co-authored-by: Mike <45373284+munkhuushmgl@users.noreply.github.com> Co-authored-by: Leah Cole <coleleah@google.com> Co-authored-by: Sergei Dorogin <github@dorogin.com>
0 parents  commit 083c3e1
Copy full SHA for 083c3e1
Expand file treeCollapse file tree

12 files changed

+1648
-0
lines changed
Open diff view settings
Collapse file

‎automl_tables_dataset.py‎

Copy file name to clipboard
+306Lines changed: 306 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,306 @@
1+
#!/usr/bin/env python
2+
3+
# Copyright 2019 Google LLC
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
17+
"""This application demonstrates how to perform basic operations on dataset
18+
with the Google AutoML Tables API.
19+
20+
For more information, the documentation at
21+
https://cloud.google.com/automl-tables/docs.
22+
"""
23+
24+
import argparse
25+
import os
26+
27+
28+
def create_dataset(project_id, compute_region, dataset_display_name):
29+
"""Create a dataset."""
30+
# [START automl_tables_create_dataset]
31+
# TODO(developer): Uncomment and set the following variables
32+
# project_id = 'PROJECT_ID_HERE'
33+
# compute_region = 'COMPUTE_REGION_HERE'
34+
# dataset_display_name = 'DATASET_DISPLAY_NAME_HERE'
35+
36+
from google.cloud import automl_v1beta1 as automl
37+
38+
client = automl.TablesClient(project=project_id, region=compute_region)
39+
40+
# Create a dataset with the given display name
41+
dataset = client.create_dataset(dataset_display_name)
42+
43+
# Display the dataset information.
44+
print("Dataset name: {}".format(dataset.name))
45+
print("Dataset id: {}".format(dataset.name.split("/")[-1]))
46+
print("Dataset display name: {}".format(dataset.display_name))
47+
print("Dataset metadata:")
48+
print("\t{}".format(dataset.tables_dataset_metadata))
49+
print("Dataset example count: {}".format(dataset.example_count))
50+
print("Dataset create time:")
51+
print("\tseconds: {}".format(dataset.create_time.seconds))
52+
print("\tnanos: {}".format(dataset.create_time.nanos))
53+
54+
# [END automl_tables_create_dataset]
55+
56+
return dataset
57+
58+
59+
def list_datasets(project_id, compute_region, filter_=None):
60+
"""List all datasets."""
61+
result = []
62+
# [START automl_tables_list_datasets]
63+
# TODO(developer): Uncomment and set the following variables
64+
# project_id = 'PROJECT_ID_HERE'
65+
# compute_region = 'COMPUTE_REGION_HERE'
66+
# filter_ = 'filter expression here'
67+
68+
from google.cloud import automl_v1beta1 as automl
69+
70+
client = automl.TablesClient(project=project_id, region=compute_region)
71+
72+
# List all the datasets available in the region by applying filter.
73+
response = client.list_datasets(filter_=filter_)
74+
75+
print("List of datasets:")
76+
for dataset in response:
77+
# Display the dataset information.
78+
print("Dataset name: {}".format(dataset.name))
79+
print("Dataset id: {}".format(dataset.name.split("/")[-1]))
80+
print("Dataset display name: {}".format(dataset.display_name))
81+
metadata = dataset.tables_dataset_metadata
82+
print(
83+
"Dataset primary table spec id: {}".format(
84+
metadata.primary_table_spec_id
85+
)
86+
)
87+
print(
88+
"Dataset target column spec id: {}".format(
89+
metadata.target_column_spec_id
90+
)
91+
)
92+
print(
93+
"Dataset target column spec id: {}".format(
94+
metadata.target_column_spec_id
95+
)
96+
)
97+
print(
98+
"Dataset weight column spec id: {}".format(
99+
metadata.weight_column_spec_id
100+
)
101+
)
102+
print(
103+
"Dataset ml use column spec id: {}".format(
104+
metadata.ml_use_column_spec_id
105+
)
106+
)
107+
print("Dataset example count: {}".format(dataset.example_count))
108+
print("Dataset create time:")
109+
print("\tseconds: {}".format(dataset.create_time.seconds))
110+
print("\tnanos: {}".format(dataset.create_time.nanos))
111+
print("\n")
112+
113+
# [END automl_tables_list_datasets]
114+
result.append(dataset)
115+
116+
return result
117+
118+
119+
def get_dataset(project_id, compute_region, dataset_display_name):
120+
"""Get the dataset."""
121+
# TODO(developer): Uncomment and set the following variables
122+
# project_id = 'PROJECT_ID_HERE'
123+
# compute_region = 'COMPUTE_REGION_HERE'
124+
# dataset_display_name = 'DATASET_DISPLAY_NAME_HERE'
125+
126+
from google.cloud import automl_v1beta1 as automl
127+
128+
client = automl.TablesClient(project=project_id, region=compute_region)
129+
130+
# Get complete detail of the dataset.
131+
dataset = client.get_dataset(dataset_display_name=dataset_display_name)
132+
133+
# Display the dataset information.
134+
print("Dataset name: {}".format(dataset.name))
135+
print("Dataset id: {}".format(dataset.name.split("/")[-1]))
136+
print("Dataset display name: {}".format(dataset.display_name))
137+
print("Dataset metadata:")
138+
print("\t{}".format(dataset.tables_dataset_metadata))
139+
print("Dataset example count: {}".format(dataset.example_count))
140+
print("Dataset create time:")
141+
print("\tseconds: {}".format(dataset.create_time.seconds))
142+
print("\tnanos: {}".format(dataset.create_time.nanos))
143+
144+
return dataset
145+
146+
147+
def import_data(project_id, compute_region, dataset_display_name, path):
148+
"""Import structured data."""
149+
# [START automl_tables_import_data]
150+
# TODO(developer): Uncomment and set the following variables
151+
# project_id = 'PROJECT_ID_HERE'
152+
# compute_region = 'COMPUTE_REGION_HERE'
153+
# dataset_display_name = 'DATASET_DISPLAY_NAME'
154+
# path = 'gs://path/to/file.csv' or 'bq://project_id.dataset.table_id'
155+
156+
from google.cloud import automl_v1beta1 as automl
157+
158+
client = automl.TablesClient(project=project_id, region=compute_region)
159+
160+
response = None
161+
if path.startswith("bq"):
162+
response = client.import_data(
163+
dataset_display_name=dataset_display_name, bigquery_input_uri=path
164+
)
165+
else:
166+
# Get the multiple Google Cloud Storage URIs.
167+
input_uris = path.split(",")
168+
response = client.import_data(
169+
dataset_display_name=dataset_display_name,
170+
gcs_input_uris=input_uris,
171+
)
172+
173+
print("Processing import...")
174+
# synchronous check of operation status.
175+
print("Data imported. {}".format(response.result()))
176+
177+
# [END automl_tables_import_data]
178+
179+
180+
def update_dataset(
181+
project_id,
182+
compute_region,
183+
dataset_display_name,
184+
target_column_spec_name=None,
185+
weight_column_spec_name=None,
186+
test_train_column_spec_name=None,
187+
):
188+
"""Update dataset."""
189+
# TODO(developer): Uncomment and set the following variables
190+
# project_id = 'PROJECT_ID_HERE'
191+
# compute_region = 'COMPUTE_REGION_HERE'
192+
# dataset_display_name = 'DATASET_DISPLAY_NAME_HERE'
193+
# target_column_spec_name = 'TARGET_COLUMN_SPEC_NAME_HERE' or None
194+
# weight_column_spec_name = 'WEIGHT_COLUMN_SPEC_NAME_HERE' or None
195+
# test_train_column_spec_name = 'TEST_TRAIN_COLUMN_SPEC_NAME_HERE' or None
196+
197+
from google.cloud import automl_v1beta1 as automl
198+
199+
client = automl.TablesClient(project=project_id, region=compute_region)
200+
201+
if target_column_spec_name is not None:
202+
response = client.set_target_column(
203+
dataset_display_name=dataset_display_name,
204+
column_spec_display_name=target_column_spec_name,
205+
)
206+
print("Target column updated. {}".format(response))
207+
if weight_column_spec_name is not None:
208+
response = client.set_weight_column(
209+
dataset_display_name=dataset_display_name,
210+
column_spec_display_name=weight_column_spec_name,
211+
)
212+
print("Weight column updated. {}".format(response))
213+
if test_train_column_spec_name is not None:
214+
response = client.set_test_train_column(
215+
dataset_display_name=dataset_display_name,
216+
column_spec_display_name=test_train_column_spec_name,
217+
)
218+
print("Test/train column updated. {}".format(response))
219+
220+
221+
def delete_dataset(project_id, compute_region, dataset_display_name):
222+
"""Delete a dataset"""
223+
# [START automl_tables_delete_dataset]
224+
# TODO(developer): Uncomment and set the following variables
225+
# project_id = 'PROJECT_ID_HERE'
226+
# compute_region = 'COMPUTE_REGION_HERE'
227+
# dataset_display_name = 'DATASET_DISPLAY_NAME_HERE
228+
229+
from google.cloud import automl_v1beta1 as automl
230+
231+
client = automl.TablesClient(project=project_id, region=compute_region)
232+
233+
# Delete a dataset.
234+
response = client.delete_dataset(dataset_display_name=dataset_display_name)
235+
236+
# synchronous check of operation status.
237+
print("Dataset deleted. {}".format(response.result()))
238+
# [END automl_tables_delete_dataset]
239+
240+
241+
if __name__ == "__main__":
242+
parser = argparse.ArgumentParser(
243+
description=__doc__,
244+
formatter_class=argparse.RawDescriptionHelpFormatter,
245+
)
246+
subparsers = parser.add_subparsers(dest="command")
247+
248+
create_dataset_parser = subparsers.add_parser(
249+
"create_dataset", help=create_dataset.__doc__
250+
)
251+
create_dataset_parser.add_argument("--dataset_name")
252+
253+
list_datasets_parser = subparsers.add_parser(
254+
"list_datasets", help=list_datasets.__doc__
255+
)
256+
list_datasets_parser.add_argument("--filter_")
257+
258+
get_dataset_parser = subparsers.add_parser(
259+
"get_dataset", help=get_dataset.__doc__
260+
)
261+
get_dataset_parser.add_argument("--dataset_display_name")
262+
263+
import_data_parser = subparsers.add_parser(
264+
"import_data", help=import_data.__doc__
265+
)
266+
import_data_parser.add_argument("--dataset_display_name")
267+
import_data_parser.add_argument("--path")
268+
269+
update_dataset_parser = subparsers.add_parser(
270+
"update_dataset", help=update_dataset.__doc__
271+
)
272+
update_dataset_parser.add_argument("--dataset_display_name")
273+
update_dataset_parser.add_argument("--target_column_spec_name")
274+
update_dataset_parser.add_argument("--weight_column_spec_name")
275+
update_dataset_parser.add_argument("--ml_use_column_spec_name")
276+
277+
delete_dataset_parser = subparsers.add_parser(
278+
"delete_dataset", help=delete_dataset.__doc__
279+
)
280+
delete_dataset_parser.add_argument("--dataset_display_name")
281+
282+
project_id = os.environ["PROJECT_ID"]
283+
compute_region = os.environ["REGION_NAME"]
284+
285+
args = parser.parse_args()
286+
if args.command == "create_dataset":
287+
create_dataset(project_id, compute_region, args.dataset_name)
288+
if args.command == "list_datasets":
289+
list_datasets(project_id, compute_region, args.filter_)
290+
if args.command == "get_dataset":
291+
get_dataset(project_id, compute_region, args.dataset_display_name)
292+
if args.command == "import_data":
293+
import_data(
294+
project_id, compute_region, args.dataset_display_name, args.path
295+
)
296+
if args.command == "update_dataset":
297+
update_dataset(
298+
project_id,
299+
compute_region,
300+
args.dataset_display_name,
301+
args.target_column_spec_name,
302+
args.weight_column_spec_name,
303+
args.ml_use_column_spec_name,
304+
)
305+
if args.command == "delete_dataset":
306+
delete_dataset(project_id, compute_region, args.dataset_display_name)

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.