Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

The goal of this project is to explore some Python's tools for analysis of audio as well as transcription.

Notifications You must be signed in to change notification settings

dpbac/speech_to_text_with_python

Open more actions menu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🗣️ Speech to Text in Python 📜

Goals in this project:

  1. Explore some Python's tools used for audio manipulation and transcription from speech to text.

  2. Transcript audio files from telephonic calls and apply on it sentiment and topic analysis.

During tools exploration phase audio files under different conditions are used. Such as:

  • Different languages of speakers (english and dutch)

  • Multiple speaker with multiple channels

  • Presence of noise

Project Organization

│
├── README.md          <- The top-level README for developers using this project.
│
├── data
│   ├── audio_call_friend       <- Audio files downloaded from CallFriend.
│   ├── audio_call_home         <- Audio files downloaded from CallHome.
│   ├── audio_common_voices     <- Files downloaded from Common Voices.
│   │   └── nl                  <- Dutch audio and .tsv files
│   │       └── raw             <- mp3 files.
│   ├── audio_openslr           <- Audio file downloaded from LibriSpeech
│   │    └── dev-clean          <- Dutch audio and .tsv files
│   │       └── 1272            <- flac files.    
│   ├── interim        <- Intermediate data that has been transformed.
│   └── processed      <- The final, canonical data sets for modeling.
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                        `1.0-jqp-initial-data-exploration`.
│
├── images             <- Images used in the project.
│
├──.gitignore          <- Contains entries of files or folders to ignore in a project.
│
└── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
                         generated with `pip freeze > requirements.txt`

The tools explored in this project are:

  • SpeechRecognition: This Python library provides an easy way to interact with many speech-to-text APIs.

  • Google Speech API: From the Speech APIs available to use within SpeechRecognition will be using a free version of Google Speech API. In addition, the free version does not support speaker diarization which is the process of splitting more than one speaker from a single audio. It is also not possible to detect punctuation. It supports different languages. Currently the following limits are applied:

  • PyDub: Allows different types of audio manipulation

Datasets

  • Dutch and English datasets: Common Voice is a initiative from Mozilla that offers open source, multi-language dataset of voices that anyone can use to train speech-enabled applications. At this moment you can have access to 18 different languages. You download not only audio files but also other .tsv files with information about those audio files. Really great resource!

  • LibriSpeech: LibriSpeech is a carefully segmented and aligned corpus of approximately 1000 hours of 16kHz read English speech, derived from read audiobooks. I downloaded only dev-clean.tar.gz.

  • Multiuser audio files: There are two datasets:

Both have some different languages (unfortunatelly not Dutch) and audios are both in wav and mp3. In addition, you can have the transcriptions.

Notebooks

01-Speech Transcription using Speech Recognition and PyDub: Use PyDub to access audio file information and modify audio files before performing transcriptions with SpeechRecognition and Speech Google API.

02-Phone calls Analysis: The goal of this notebook is to transcribe some phone calls and perform sentiment and topic analysis on them. For now we retrieve data from CallFriend and performed some analysis on audio attributes on the audio files retrieved. Therefore, this one is still in development.

Install/Technical requirements

  • conda version: 4.8.3
  • Install requirements using pip install -r requirements.txt.
    • Make sure you use Python 3 (I used 3.6.7).
    • You may want to use a virtual environment for this.

Project based on the cookiecutter data science project template. #cookiecutterdatascience

About

The goal of this project is to explore some Python's tools for analysis of audio as well as transcription.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
Morty Proxy This is a proxified and sanitized view of the page, visit original site.