Goals in this project:
-
Explore some Python's tools used for audio manipulation and transcription from speech to text.
-
Transcript audio files from telephonic calls and apply on it sentiment and topic analysis.
During tools exploration phase audio files under different conditions are used. Such as:
-
Different languages of speakers (english and dutch)
-
Multiple speaker with multiple channels
-
Presence of noise
│
├── README.md <- The top-level README for developers using this project.
│
├── data
│ ├── audio_call_friend <- Audio files downloaded from CallFriend.
│ ├── audio_call_home <- Audio files downloaded from CallHome.
│ ├── audio_common_voices <- Files downloaded from Common Voices.
│ │ └── nl <- Dutch audio and .tsv files
│ │ └── raw <- mp3 files.
│ ├── audio_openslr <- Audio file downloaded from LibriSpeech
│ │ └── dev-clean <- Dutch audio and .tsv files
│ │ └── 1272 <- flac files.
│ ├── interim <- Intermediate data that has been transformed.
│ └── processed <- The final, canonical data sets for modeling.
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
│ `1.0-jqp-initial-data-exploration`.
│
├── images <- Images used in the project.
│
├──.gitignore <- Contains entries of files or folders to ignore in a project.
│
└── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
generated with `pip freeze > requirements.txt`
The tools explored in this project are:
-
SpeechRecognition: This Python library provides an easy way to interact with many speech-to-text APIs. -
Google Speech API: From the Speech APIs available to use within SpeechRecognition will be using a free version ofGoogle Speech API. In addition, the free version does not supportspeaker diarizationwhich is the process of splitting more than one speaker from a single audio. It is also not possible to detect punctuation. It supports different languages. Currently the following limits are applied:
PyDub: Allows different types of audio manipulation
-
Dutch and English datasets:Common Voiceis a initiative from Mozilla that offers open source, multi-language dataset of voices that anyone can use to train speech-enabled applications. At this moment you can have access to 18 different languages. You download not only audio files but also other .tsv files with information about those audio files. Really great resource! -
LibriSpeech:LibriSpeechis a carefully segmented and aligned corpus of approximately 1000 hours of 16kHz read English speech, derived from read audiobooks. I downloaded onlydev-clean.tar.gz. -
Multiuser audio files: There are two datasets:
Both have some different languages (unfortunatelly not Dutch) and audios are both in wav and mp3. In addition, you can have the
transcriptions.
01-Speech Transcription using Speech Recognition and PyDub:
Use PyDub to access audio file information and modify audio files before performing transcriptions with SpeechRecognition and Speech Google API.
02-Phone calls Analysis:
The goal of this notebook is to transcribe some phone calls and perform sentiment and topic analysis on them. For now we retrieve data from
CallFriend and performed some analysis on audio attributes on the audio files retrieved. Therefore, this one is still in development.
- conda version: 4.8.3
- Install requirements using
pip install -r requirements.txt.- Make sure you use Python 3 (I used 3.6.7).
- You may want to use a virtual environment for this.
Project based on the cookiecutter data science project template. #cookiecutterdatascience