Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 50c3bcd

Browse filesBrowse files
dizcologychenyumic
authored andcommitted
Speech API: enhanced model and recognition metadata (GoogleCloudPlatform#1436)
* enhanced model and recognition metadata * flake, update tests * readme * client library version update
1 parent 7405c00 commit 50c3bcd
Copy full SHA for 50c3bcd

File tree

Expand file treeCollapse file tree

6 files changed

+202
-21
lines changed
Filter options
Expand file treeCollapse file tree

6 files changed

+202
-21
lines changed

‎speech/cloud-client/README.rst

Copy file name to clipboardExpand all lines: speech/cloud-client/README.rst
+48-20Lines changed: 48 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ This directory contains samples for Google Cloud Speech API. The `Google Cloud S
1616

1717

1818

19-
.. _Google Cloud Speech API: https://cloud.google.com/speech/docs/
19+
.. _Google Cloud Speech API: https://cloud.google.com/speech/docs/
2020

2121
Setup
2222
-------------------------------------------------------------------------------
@@ -91,22 +91,21 @@ To run this sample:
9191
$ python transcribe.py
9292
9393
usage: transcribe.py [-h] path
94-
94+
9595
Google Cloud Speech API sample application using the REST API for batch
9696
processing.
97-
97+
9898
Example usage:
9999
python transcribe.py resources/audio.raw
100100
python transcribe.py gs://cloud-samples-tests/speech/brooklyn.flac
101-
101+
102102
positional arguments:
103103
path File or GCS path for audio file to be recognized
104-
104+
105105
optional arguments:
106106
-h, --help show this help message and exit
107107
108108
109-
110109
Transcribe async
111110
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
112111

@@ -123,22 +122,21 @@ To run this sample:
123122
$ python transcribe_async.py
124123
125124
usage: transcribe_async.py [-h] path
126-
125+
127126
Google Cloud Speech API sample application using the REST API for async
128127
batch processing.
129-
128+
130129
Example usage:
131130
python transcribe_async.py resources/audio.raw
132131
python transcribe_async.py gs://cloud-samples-tests/speech/vr.flac
133-
132+
134133
positional arguments:
135134
path File or GCS path for audio file to be recognized
136-
135+
137136
optional arguments:
138137
-h, --help show this help message and exit
139138
140139
141-
142140
Transcribe with word time offsets
143141
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
144142

@@ -155,21 +153,20 @@ To run this sample:
155153
$ python transcribe_word_time_offsets.py
156154
157155
usage: transcribe_word_time_offsets.py [-h] path
158-
156+
159157
Google Cloud Speech API sample that demonstrates word time offsets.
160-
158+
161159
Example usage:
162160
python transcribe_word_time_offsets.py resources/audio.raw
163161
python transcribe_word_time_offsets.py gs://cloud-samples-tests/speech/vr.flac
164-
162+
165163
positional arguments:
166164
path File or GCS path for audio file to be recognized
167-
165+
168166
optional arguments:
169167
-h, --help show this help message and exit
170168
171169
172-
173170
Transcribe Streaming
174171
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
175172

@@ -186,19 +183,50 @@ To run this sample:
186183
$ python transcribe_streaming.py
187184
188185
usage: transcribe_streaming.py [-h] stream
189-
186+
190187
Google Cloud Speech API sample application using the streaming API.
191-
188+
192189
Example usage:
193190
python transcribe_streaming.py resources/audio.raw
194-
191+
195192
positional arguments:
196193
stream File to stream to the API
197-
194+
198195
optional arguments:
199196
-h, --help show this help message and exit
200197
201198
199+
Beta Samples
200+
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
201+
202+
.. image:: https://gstatic.com/cloudssh/images/open-btn.png
203+
:target: https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/GoogleCloudPlatform/python-docs-samples&page=editor&open_in_editor=speech/cloud-client/beta_snippets.py;speech/cloud-client/README.rst
204+
205+
206+
207+
208+
To run this sample:
209+
210+
.. code-block:: bash
211+
212+
$ python beta_snippets.py
213+
214+
usage: beta_snippets.py [-h] command path
215+
216+
Google Cloud Speech API sample that demonstrates enhanced models
217+
and recognition metadata.
218+
219+
Example usage:
220+
python beta_snippets.py enhanced-model resources/commercial_mono.wav
221+
python beta_snippets.py metadata resources/commercial_mono.wav
222+
223+
positional arguments:
224+
command
225+
path File for audio file to be recognized
226+
227+
optional arguments:
228+
-h, --help show this help message and exit
229+
202230
203231
204232

‎speech/cloud-client/README.rst.in

Copy file name to clipboardExpand all lines: speech/cloud-client/README.rst.in
+3Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,9 @@ samples:
3434
- name: Transcribe Streaming
3535
file: transcribe_streaming.py
3636
show_help: true
37+
- name: Beta Samples
38+
file: beta_snippets.py
39+
show_help: true
3740

3841
cloud_client_library: true
3942

‎speech/cloud-client/beta_snippets.py

Copy file name to clipboard
+115Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
#!/usr/bin/env python
2+
3+
# Copyright 2018 Google Inc. All Rights Reserved.
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
17+
"""Google Cloud Speech API sample that demonstrates enhanced models
18+
and recognition metadata.
19+
20+
Example usage:
21+
python beta_snippets.py enhanced-model resources/commercial_mono.wav
22+
python beta_snippets.py metadata resources/commercial_mono.wav
23+
"""
24+
25+
import argparse
26+
import io
27+
28+
from google.cloud import speech_v1p1beta1 as speech
29+
30+
31+
# [START speech_transcribe_file_with_enhanced_model]
32+
def transcribe_file_with_enhanced_model(path):
33+
"""Transcribe the given audio file using an enhanced model."""
34+
client = speech.SpeechClient()
35+
36+
with io.open(path, 'rb') as audio_file:
37+
content = audio_file.read()
38+
39+
audio = speech.types.RecognitionAudio(content=content)
40+
config = speech.types.RecognitionConfig(
41+
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
42+
sample_rate_hertz=8000,
43+
language_code='en-US',
44+
# Enhanced models are only available to projects that
45+
# opt in for audio data collection.
46+
use_enhanced=True,
47+
# A model must be specified to use enhanced model.
48+
model='phone_call')
49+
50+
response = client.recognize(config, audio)
51+
52+
for i, result in enumerate(response.results):
53+
alternative = result.alternatives[0]
54+
print('-' * 20)
55+
print('First alternative of result {}'.format(i))
56+
print('Transcript: {}'.format(alternative.transcript))
57+
# [END speech_transcribe_file_with_enhanced_model]
58+
59+
60+
# [START speech_transcribe_file_with_metadata]
61+
def transcribe_file_with_metadata(path):
62+
"""Send a request that includes recognition metadata."""
63+
client = speech.SpeechClient()
64+
65+
with io.open(path, 'rb') as audio_file:
66+
content = audio_file.read()
67+
68+
# Here we construct a recognition metadata object.
69+
# Most metadata fields are specified as enums that can be found
70+
# in speech.enums.RecognitionMetadata
71+
metadata = speech.types.RecognitionMetadata()
72+
metadata.interaction_type = (
73+
speech.enums.RecognitionMetadata.InteractionType.DISCUSSION)
74+
metadata.microphone_distance = (
75+
speech.enums.RecognitionMetadata.MicrophoneDistance.NEARFIELD)
76+
metadata.recording_device_type = (
77+
speech.enums.RecognitionMetadata.RecordingDeviceType.SMARTPHONE)
78+
# Some metadata fields are free form strings
79+
metadata.recording_device_name = "Pixel 2 XL"
80+
# And some are integers, for instance the 6 digit NAICS code
81+
# https://www.naics.com/search/
82+
metadata.industry_naics_code_of_audio = 519190
83+
84+
audio = speech.types.RecognitionAudio(content=content)
85+
config = speech.types.RecognitionConfig(
86+
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
87+
sample_rate_hertz=8000,
88+
language_code='en-US',
89+
# Add this in the request to send metadata.
90+
metadata=metadata)
91+
92+
response = client.recognize(config, audio)
93+
94+
for i, result in enumerate(response.results):
95+
alternative = result.alternatives[0]
96+
print('-' * 20)
97+
print('First alternative of result {}'.format(i))
98+
print('Transcript: {}'.format(alternative.transcript))
99+
# [END speech_transcribe_file_with_metadata]
100+
101+
102+
if __name__ == '__main__':
103+
parser = argparse.ArgumentParser(
104+
description=__doc__,
105+
formatter_class=argparse.RawDescriptionHelpFormatter)
106+
parser.add_argument('command')
107+
parser.add_argument(
108+
'path', help='File for audio file to be recognized')
109+
110+
args = parser.parse_args()
111+
112+
if args.command == 'enhanced-model':
113+
transcribe_file_with_enhanced_model(args.path)
114+
elif args.command == 'metadata':
115+
transcribe_file_with_metadata(args.path)
+35Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Copyright 2018, Google, Inc.
2+
# Licensed under the Apache License, Version 2.0 (the "License");
3+
# you may not use this file except in compliance with the License.
4+
# You may obtain a copy of the License at
5+
#
6+
# http://www.apache.org/licenses/LICENSE-2.0
7+
#
8+
# Unless required by applicable law or agreed to in writing, software
9+
# distributed under the License is distributed on an "AS IS" BASIS,
10+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11+
# See the License for the specific language governing permissions and
12+
# limitations under the License.
13+
14+
import os
15+
16+
from beta_snippets import (
17+
transcribe_file_with_enhanced_model, transcribe_file_with_metadata)
18+
19+
RESOURCES = os.path.join(os.path.dirname(__file__), 'resources')
20+
21+
22+
def test_transcribe_file_with_enhanced_model(capsys):
23+
transcribe_file_with_enhanced_model(
24+
os.path.join(RESOURCES, 'commercial_mono.wav'))
25+
out, _ = capsys.readouterr()
26+
27+
assert 'Chrome' in out
28+
29+
30+
def test_transcribe_file_with_metadata(capsys):
31+
transcribe_file_with_metadata(
32+
os.path.join(RESOURCES, 'commercial_mono.wav'))
33+
out, _ = capsys.readouterr()
34+
35+
assert 'Chrome' in out

‎speech/cloud-client/requirements.txt

Copy file name to clipboard
+1-1Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
google-cloud-speech==0.32.1
1+
google-cloud-speech==0.33.0
Binary file not shown.

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.