Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit de9fd87

Browse filesBrowse files
authored
Added the sample for async image batch annotation (GoogleCloudPlatform#2045)
* Added the sample for async image batch annotation * Fixed the wrong function name * Changes based on Noah's comments.
1 parent 4c764f4 commit de9fd87
Copy full SHA for de9fd87

File tree

Expand file treeCollapse file tree

3 files changed

+127
-29
lines changed
Filter options
Expand file treeCollapse file tree

3 files changed

+127
-29
lines changed

‎vision/cloud-client/detect/README.rst

Copy file name to clipboardExpand all lines: vision/cloud-client/detect/README.rst
+14-6Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,7 @@ To run this sample:
165165
$ python beta_snippets.py
166166
167167
usage: beta_snippets.py [-h]
168-
{object-localization,object-localization-uri,handwritten-ocr,handwritten-ocr-uri,doc-features,doc-features-uri}
168+
{object-localization,object-localization-uri,handwritten-ocr,handwritten-ocr-uri,batch-annotate-files,batch-annotate-files-uri,batch-annotate-images-uri}
169169
...
170170
171171
Google Cloud Vision API Python Beta Snippets
@@ -176,14 +176,15 @@ To run this sample:
176176
python beta_snippets.py object-localization-uri gs://...
177177
python beta_snippets.py handwritten-ocr INPUT_IMAGE
178178
python beta_snippets.py handwritten-ocr-uri gs://...
179-
python beta_snippets.py doc-features INPUT_PDF
180-
python beta_snippets.py doc-features_uri gs://...
179+
python beta_snippets.py batch-annotate-files INPUT_PDF
180+
python beta_snippets.py batch-annotate-files-uri gs://...
181+
python beta_snippets.py batch-annotate-images-uri gs://... gs://...
181182
182183
For more information, the documentation at
183184
https://cloud.google.com/vision/docs.
184185
185186
positional arguments:
186-
{object-localization,object-localization-uri,handwritten-ocr,handwritten-ocr-uri,doc-features,doc-features-uri}
187+
{object-localization,object-localization-uri,handwritten-ocr,handwritten-ocr-uri,batch-annotate-files,batch-annotate-files-uri,batch-annotate-images-uri}
187188
object-localization
188189
Localize objects in the local image. Args: path: The
189190
path to the local file.
@@ -197,14 +198,21 @@ To run this sample:
197198
Detects handwritten characters in the file located in
198199
Google Cloud Storage. Args: uri: The path to the file
199200
in Google Cloud Storage (gs://...)
200-
doc-features Detects document features in a PDF/TIFF/GIF file.
201+
batch-annotate-files
202+
Detects document features in a PDF/TIFF/GIF file.
201203
While your PDF file may have several pages, this API
202204
can process up to 5 pages only. Args: path: The path
203205
to the local file.
204-
doc-features-uri Detects document features in a PDF/TIFF/GIF file.
206+
batch-annotate-files-uri
207+
Detects document features in a PDF/TIFF/GIF file.
205208
While your PDF file may have several pages, this API
206209
can process up to 5 pages only. Args: uri: The path to
207210
the file in Google Cloud Storage (gs://...)
211+
batch-annotate-images-uri
212+
Batch annotation of images on Google Cloud Storage
213+
asynchronously. Args: image_uri: The path to the image
214+
in Google Cloud Storage (gs://...) gcs_uri: The path
215+
to the output path in Google Cloud Storage (gs://...)
208216
209217
optional arguments:
210218
-h, --help show this help message and exit

‎vision/cloud-client/detect/beta_snippets.py

Copy file name to clipboardExpand all lines: vision/cloud-client/detect/beta_snippets.py
+97-19Lines changed: 97 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,9 @@
2323
python beta_snippets.py object-localization-uri gs://...
2424
python beta_snippets.py handwritten-ocr INPUT_IMAGE
2525
python beta_snippets.py handwritten-ocr-uri gs://...
26-
python beta_snippets.py doc-features INPUT_PDF
27-
python beta_snippets.py doc-features_uri gs://...
26+
python beta_snippets.py batch-annotate-files INPUT_PDF
27+
python beta_snippets.py batch-annotate-files-uri gs://...
28+
python beta_snippets.py batch-annotate-images-uri gs://... gs://...
2829
2930
3031
For more information, the documentation at
@@ -176,8 +177,8 @@ def detect_handwritten_ocr_uri(uri):
176177
# [END vision_handwritten_ocr_gcs_beta]
177178

178179

179-
# [START vision_fulltext_detection_pdf_beta]
180-
def detect_document_features(path):
180+
# [START vision_batch_annotate_files_beta]
181+
def detect_batch_annotate_files(path):
181182
"""Detects document features in a PDF/TIFF/GIF file.
182183
183184
While your PDF file may have several pages,
@@ -224,12 +225,12 @@ def detect_document_features(path):
224225
for symbol in word.symbols:
225226
print('\t\t\tSymbol: {} (confidence: {})'.format(
226227
symbol.text, symbol.confidence))
227-
# [END vision_fulltext_detection_pdf_beta]
228+
# [END vision_batch_annotate_files_beta]
228229

229230

230-
# [START vision_fulltext_detection_pdf_gcs_beta]
231-
def detect_document_features_uri(gcs_uri):
232-
"""Detects document features in a PDF/TIFF/GIF file.
231+
# [START vision_batch_annotate_files_gcs_beta]
232+
def detect_batch_annotate_files_uri(gcs_uri):
233+
"""Detects document features in a PDF/TIFF/GIF file.
233234
234235
While your PDF file may have several pages,
235236
this API can process up to 5 pages only.
@@ -272,7 +273,75 @@ def detect_document_features_uri(gcs_uri):
272273
for symbol in word.symbols:
273274
print('\t\t\tSymbol: {} (confidence: {})'.format(
274275
symbol.text, symbol.confidence))
275-
# [END vision_fulltext_detection_pdf_gcs_beta]
276+
# [END vision_batch_annotate_files_gcs_beta]
277+
278+
279+
# [START vision_async_batch_annotate_images_beta]
280+
def async_batch_annotate_images_uri(input_image_uri, output_uri):
281+
"""Batch annotation of images on Google Cloud Storage asynchronously.
282+
283+
Args:
284+
input_image_uri: The path to the image in Google Cloud Storage (gs://...)
285+
output_uri: The path to the output path in Google Cloud Storage (gs://...)
286+
"""
287+
import re
288+
289+
from google.cloud import storage
290+
from google.protobuf import json_format
291+
from google.cloud import vision_v1p4beta1 as vision
292+
client = vision.ImageAnnotatorClient()
293+
294+
# Construct the request for the image(s) to be annotated:
295+
image_source = vision.types.ImageSource(image_uri=input_image_uri)
296+
image = vision.types.Image(source=image_source)
297+
features = [
298+
vision.types.Feature(type=vision.enums.Feature.Type.LABEL_DETECTION),
299+
vision.types.Feature(type=vision.enums.Feature.Type.TEXT_DETECTION),
300+
vision.types.Feature(type=vision.enums.Feature.Type.IMAGE_PROPERTIES),
301+
]
302+
requests = [
303+
vision.types.AnnotateImageRequest(image=image, features=features),
304+
]
305+
306+
gcs_destination = vision.types.GcsDestination(uri=output_uri)
307+
output_config = vision.types.OutputConfig(
308+
gcs_destination=gcs_destination, batch_size=2)
309+
310+
operation = client.async_batch_annotate_images(
311+
requests=requests, output_config=output_config)
312+
313+
print('Waiting for the operation to finish.')
314+
operation.result(timeout=10000)
315+
316+
# Once the request has completed and the output has been
317+
# written to Google Cloud Storage, we can list all the output files.
318+
storage_client = storage.Client()
319+
320+
match = re.match(r'gs://([^/]+)/(.+)', output_uri)
321+
bucket_name = match.group(1)
322+
prefix = match.group(2)
323+
324+
bucket = storage_client.get_bucket(bucket_name=bucket_name)
325+
326+
# Lists objects with the given prefix.
327+
blob_list = list(bucket.list_blobs(prefix=prefix))
328+
print('Output files:')
329+
for blob in blob_list:
330+
print(blob.name)
331+
332+
# Processes the first output file from Google Cloud Storage.
333+
# Since we specified batch_size=2, the first response contains
334+
# annotations for the first two annotate image requests.
335+
output = blob_list[0]
336+
337+
json_string = output.download_as_string()
338+
response = json_format.Parse(json_string,
339+
vision.types.BatchAnnotateImagesResponse())
340+
341+
# Prints the actual response for the first annotate image request.
342+
print(u'The annotation response for the first request: {}'.format(
343+
response.responses[0]))
344+
# [END vision_async_batch_annotate_images_beta]
276345

277346

278347
if __name__ == '__main__':
@@ -297,13 +366,20 @@ def detect_document_features_uri(gcs_uri):
297366
'handwritten-ocr-uri', help=detect_handwritten_ocr_uri.__doc__)
298367
handwritten_uri_parser.add_argument('uri')
299368

300-
doc_features_parser = subparsers.add_parser(
301-
'doc-features', help=detect_document_features.__doc__)
302-
doc_features_parser.add_argument('path')
369+
batch_annotate_parser = subparsers.add_parser(
370+
'batch-annotate-files', help=detect_batch_annotate_files.__doc__)
371+
batch_annotate_parser.add_argument('path')
372+
373+
batch_annotate_uri_parser = subparsers.add_parser(
374+
'batch-annotate-files-uri',
375+
help=detect_batch_annotate_files_uri.__doc__)
376+
batch_annotate_uri_parser.add_argument('uri')
303377

304-
doc_features_uri_parser = subparsers.add_parser(
305-
'doc-features-uri', help=detect_document_features_uri.__doc__)
306-
doc_features_uri_parser.add_argument('uri')
378+
batch_annotate__image_uri_parser = subparsers.add_parser(
379+
'batch-annotate-images-uri',
380+
help=async_batch_annotate_images_uri.__doc__)
381+
batch_annotate__image_uri_parser.add_argument('uri')
382+
batch_annotate__image_uri_parser.add_argument('output')
307383

308384
args = parser.parse_args()
309385

@@ -312,12 +388,14 @@ def detect_document_features_uri(gcs_uri):
312388
localize_objects_uri(args.uri)
313389
elif 'handwritten-ocr-uri' in args.command:
314390
detect_handwritten_ocr_uri(args.uri)
315-
elif 'doc-features' in args.command:
316-
detect_handwritten_ocr_uri(args.uri)
391+
elif 'batch-annotate-files' in args.command:
392+
detect_batch_annotate_files_uri(args.uri)
393+
elif 'batch-annotate-images' in args.command:
394+
async_batch_annotate_images_uri(args.uri, args.output)
317395
else:
318396
if 'object-localization' in args.command:
319397
localize_objects(args.path)
320398
elif 'handwritten-ocr' in args.command:
321399
detect_handwritten_ocr(args.path)
322-
elif 'doc-features' in args.command:
323-
detect_handwritten_ocr(args.path)
400+
elif 'batch-annotate-files' in args.command:
401+
detect_batch_annotate_files(args.path)

‎vision/cloud-client/detect/beta_snippets_test.py

Copy file name to clipboardExpand all lines: vision/cloud-client/detect/beta_snippets_test.py
+16-4Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,10 @@
1818
RESOURCES = os.path.join(os.path.dirname(__file__), 'resources')
1919
GCS_ROOT = 'gs://cloud-samples-data/vision/'
2020

21+
BUCKET = os.environ['CLOUD_STORAGE_BUCKET']
22+
OUTPUT_PREFIX = 'OCR_PDF_TEST_OUTPUT'
23+
GCS_DESTINATION_URI = 'gs://{}/{}/'.format(BUCKET, OUTPUT_PREFIX)
24+
2125

2226
def test_localize_objects(capsys):
2327
path = os.path.join(RESOURCES, 'puppies.jpg')
@@ -55,17 +59,25 @@ def test_handwritten_ocr_uri(capsys):
5559
assert 'Cloud Vision API' in out
5660

5761

58-
def test_detect_pdf_document(capsys):
62+
def test_detect_batch_annotate_files(capsys):
5963
file_name = os.path.join(RESOURCES, 'kafka.pdf')
60-
beta_snippets.detect_document_features(file_name)
64+
beta_snippets.detect_batch_annotate_files(file_name)
6165
out, _ = capsys.readouterr()
6266
assert 'Symbol: a' in out
6367
assert 'Word text: evenings' in out
6468

6569

66-
def test_detect_pdf_document_from_gcs(capsys):
70+
def test_detect_batch_annotate_files_uri(capsys):
6771
gcs_uri = GCS_ROOT + 'document_understanding/kafka.pdf'
68-
beta_snippets.detect_document_features_uri(gcs_uri)
72+
beta_snippets.detect_batch_annotate_files_uri(gcs_uri)
6973
out, _ = capsys.readouterr()
7074
assert 'Symbol' in out
7175
assert 'Word text' in out
76+
77+
78+
def test_async_batch_annotate_images(capsys):
79+
gcs_uri = GCS_ROOT + 'landmark/eiffel_tower.jpg'
80+
beta_snippets.async_batch_annotate_images_uri(gcs_uri, GCS_DESTINATION_URI)
81+
out, _ = capsys.readouterr()
82+
assert 'language_code: "en"' in out
83+
assert 'description: "Tower"' in out

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.