Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit e16a295

Browse filesBrowse files
aman-ebaytswast
authored andcommitted
Create python-api-walkthrough.md (GoogleCloudPlatform#1966)
* Create python-api-walkthrough.md This Google Cloud Shell walkthrough is linked to Cloud Dataproc documentation to be published at: https://cloud.google.com/dataproc/docs/tutorials/python-library-example * Update python-api-walkthrough.md
1 parent c611792 commit e16a295
Copy full SHA for e16a295

File tree

Expand file treeCollapse file tree

1 file changed

+165
-0
lines changed
Filter options
Expand file treeCollapse file tree

1 file changed

+165
-0
lines changed

‎dataproc/python-api-walkthrough.md

Copy file name to clipboard
+165Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
# Use the Python Client Library to call Cloud Dataproc APIs
2+
3+
Estimated completion time: <walkthrough-tutorial-duration duration="5"></walkthrough-tutorial-duration>
4+
5+
## Overview
6+
7+
This [Cloud Shell](https://cloud.google.com/shell/docs/) walkthrough leads you
8+
through the steps to use the
9+
[Google APIs Client Library for Python](http://code.google.com/p/google-api-python-client/ )
10+
to programmatically interact with [Cloud Dataproc](https://cloud.google.com/dataproc/docs/).
11+
12+
As you follow this walkthrough, you run Python code that calls
13+
[Cloud Dataproc REST API](https://cloud.google.com//dataproc/docs/reference/rest/)
14+
methods to:
15+
16+
* create a Cloud Dataproc cluster
17+
* submit a small PySpark word sort job to run on the cluster
18+
* get job status
19+
* tear down the cluster after job completion
20+
21+
## Using the walkthrough
22+
23+
The `submit_job_to_cluster.py file` used in this walkthrough is opened in the
24+
Cloud Shell editor when you launch the walkthrough. You can view
25+
the code as your follow the walkthrough steps.
26+
27+
**For more information**: See [Cloud Dataproc&rarr;Use the Python Client Library](https://cloud.google.com/dataproc/docs/tutorials/python-library-example) for
28+
an explanation of how the code works.
29+
30+
**To reload this walkthrough:** Run the following command from the
31+
`~/python-docs-samples/dataproc` directory in Cloud Shell:
32+
33+
cloudshell launch-tutorial python-api-walkthrough.md
34+
35+
**To copy and run commands**: Click the "Paste in Cloud Shell" button
36+
(<walkthrough-cloud-shell-icon></walkthrough-cloud-shell-icon>)
37+
on the side of a code box, then press `Enter` to run the command.
38+
39+
## Prerequisites (1)
40+
41+
1. Create or select a Google Cloud Platform project to use for this tutorial.
42+
* <walkthrough-project-billing-setup permissions=""></walkthrough-project-billing-setup>
43+
44+
1. Enable the Cloud Dataproc, Compute Engine, and Cloud Storage APIs in your project.
45+
* <walkthrough-enable-apis apis="dataproc,compute_component,storage-component.googleapis.com"></walkthrough-enable-apis>
46+
47+
## Prerequisites (2)
48+
49+
1. This walkthrough uploads a PySpark file (`pyspark_sort.py`) to a
50+
[Cloud Storage bucket](https://cloud.google.com/storage/docs/key-terms#buckets) in
51+
your project.
52+
* You can use the [Cloud Storage browser page](https://console.cloud.google.com/storage/browser)
53+
in Google Cloud Platform Console to view existing buckets in your project.
54+
55+
&nbsp;&nbsp;&nbsp;&nbsp;**OR**
56+
57+
* To create a new bucket, run the following command. Your bucket name must be unique.
58+
```bash
59+
gsutil mb -p {{project-id}} gs://your-bucket-name
60+
```
61+
62+
1. Set environment variables.
63+
64+
* Set the name of your bucket.
65+
```bash
66+
BUCKET=your-bucket-name
67+
```
68+
69+
## Prerequisites (3)
70+
71+
1. Set up a Python
72+
[virtual environment](https://virtualenv.readthedocs.org/en/latest/)
73+
in Cloud Shell.
74+
75+
* Create the virtual environment.
76+
```bash
77+
virtualenv ENV
78+
```
79+
* Activate the virtual environment.
80+
```bash
81+
source ENV/bin/activate
82+
```
83+
84+
1. Install library dependencies in Cloud Shell.
85+
```bash
86+
pip install -r requirements.txt
87+
```
88+
89+
## Create a cluster and submit a job
90+
91+
1. Set a name for your new cluster.
92+
```bash
93+
CLUSTER=new-cluster-name
94+
```
95+
96+
1. Set a [zone](https://cloud.google.com/compute/docs/regions-zones/#available)
97+
where your new cluster will be located. You can change the
98+
"us-central1-a" zone that is pre-set in the following command.
99+
```bash
100+
ZONE=us-central1-a
101+
```
102+
103+
1. Run `submit_job.py` with the `--create_new_cluster` flag
104+
to create a new cluster and submit the `pyspark_sort.py` job
105+
to the cluster.
106+
107+
```bash
108+
python submit_job_to_cluster.py \
109+
--project_id={{project-id}} \
110+
--cluster_name=$CLUSTER \
111+
--zone=$ZONE \
112+
--gcs_bucket=$BUCKET \
113+
--create_new_cluster
114+
```
115+
116+
## Job Output
117+
118+
Job output in Cloud Shell shows cluster creation, job submission,
119+
job completion, and then tear-down of the cluster.
120+
121+
...
122+
Creating cluster...
123+
Cluster created.
124+
Uploading pyspark file to GCS
125+
new-cluster-name - RUNNING
126+
Submitted job ID ...
127+
Waiting for job to finish...
128+
Job finished.
129+
Downloading output file
130+
.....
131+
['Hello,', 'dog', 'elephant', 'panther', 'world!']
132+
...
133+
Tearing down cluster
134+
```
135+
## Congratulations on Completing the Walkthrough!
136+
<walkthrough-conclusion-trophy></walkthrough-conclusion-trophy>
137+
138+
---
139+
140+
### Next Steps:
141+
142+
* **View job details from the Console.** View job details by selecting the
143+
PySpark job from the Cloud Dataproc
144+
[Jobs page](https://console.cloud.google.com/dataproc/jobs)
145+
in the Google Cloud Platform Console.
146+
147+
* **Delete resources used in the walkthrough.**
148+
The `submit_job.py` job deletes the cluster that it created for this
149+
walkthrough.
150+
151+
If you created a bucket to use for this walkthrough,
152+
you can run the following command to delete the
153+
Cloud Storage bucket (the bucket must be empty).
154+
```bash
155+
gsutil rb gs://$BUCKET
156+
```
157+
You can run the following command to delete the bucket **and all
158+
objects within it. Note: the deleted objects cannot be recovered.**
159+
```bash
160+
gsutil rm -r gs://$BUCKET
161+
```
162+
163+
* **For more information.** See the [Cloud Dataproc documentation](https://cloud.google.com/dataproc/docs/)
164+
for API reference and product feature information.
165+

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.