🗣️ Speech to Text in Python 📜

Goals in this project:

Explore some Python's tools used for audio manipulation and transcription from speech to text.
Transcript audio files from telephonic calls and apply on it sentiment and topic analysis.

During tools exploration phase audio files under different conditions are used. Such as:

Different languages of speakers (english and dutch)
Multiple speaker with multiple channels
Presence of noise

Project Organization

│
├── README.md          <- The top-level README for developers using this project.
│
├── data
│   ├── audio_call_friend       <- Audio files downloaded from CallFriend.
│   ├── audio_call_home         <- Audio files downloaded from CallHome.
│   ├── audio_common_voices     <- Files downloaded from Common Voices.
│   │   └── nl                  <- Dutch audio and .tsv files
│   │       └── raw             <- mp3 files.
│   ├── audio_openslr           <- Audio file downloaded from LibriSpeech
│   │    └── dev-clean          <- Dutch audio and .tsv files
│   │       └── 1272            <- flac files.    
│   ├── interim        <- Intermediate data that has been transformed.
│   └── processed      <- The final, canonical data sets for modeling.
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                        `1.0-jqp-initial-data-exploration`.
│
├── images             <- Images used in the project.
│
├──.gitignore          <- Contains entries of files or folders to ignore in a project.
│
└── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
                         generated with `pip freeze > requirements.txt`

The tools explored in this project are:

SpeechRecognition: This Python library provides an easy way to interact with many speech-to-text APIs.
Google Speech API: From the Speech APIs available to use within SpeechRecognition will be using a free version of Google Speech API. In addition, the free version does not support speaker diarization which is the process of splitting more than one speaker from a single audio. It is also not possible to detect punctuation. It supports different languages. Currently the following limits are applied:

PyDub: Allows different types of audio manipulation

Datasets

Dutch and English datasets: Common Voice is a initiative from Mozilla that offers open source, multi-language dataset of voices that anyone can use to train speech-enabled applications. At this moment you can have access to 18 different languages. You download not only audio files but also other .tsv files with information about those audio files. Really great resource!
LibriSpeech: LibriSpeech is a carefully segmented and aligned corpus of approximately 1000 hours of 16kHz read English speech, derived from read audiobooks. I downloaded only dev-clean.tar.gz.
Multiuser audio files: There are two datasets:
- Call Friend
- Call Home

Both have some different languages (unfortunatelly not Dutch) and audios are both in wav and mp3. In addition, you can have the transcriptions.

Notebooks

01-Speech Transcription using Speech Recognition and PyDub: Use PyDub to access audio file information and modify audio files before performing transcriptions with SpeechRecognition and Speech Google API.

02-Phone calls Analysis: The goal of this notebook is to transcribe some phone calls and perform sentiment and topic analysis on them. For now we retrieve data from CallFriend and performed some analysis on audio attributes on the audio files retrieved. Therefore, this one is still in development.

Install/Technical requirements

conda version: 4.8.3
Install requirements using pip install -r requirements.txt.
- Make sure you use Python 3 (I used 3.6.7).
- You may want to use a virtual environment for this.

Project based on the cookiecutter data science project template. #cookiecutterdatascience

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🗣️ Speech to Text in Python 📜

Project Organization

Datasets

Notebooks

Install/Technical requirements

About

Uh oh!

Releases

Packages

Languages

Name	Name	Last commit message	Last commit date
Latest commit History 15 Commits 15 Commits
images	images
notebooks	notebooks
.gitignore	.gitignore
README.md	README.md
requirements.txt	requirements.txt

Search code, repositories, users, issues, pull requests...

dpbac/speech_to_text_with_python

Folders and files

Latest commit

History

Repository files navigation

🗣️ Speech to Text in Python 📜

Project Organization

Datasets

Notebooks

Install/Technical requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages