Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure. Learn more →
Top 23 Python speech-recognition Projects
-
transformers
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
HuggingFace Transformers - Library for building custom detectors
-
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
-
-
Project mention: A beginner's guide to the Whisperx-A40-Large model by Victor-Upmeet on Replicate | dev.to | 2026-01-04
The whisperx-a40-large model is an accelerated version of the popular Whisper automatic speech recognition (ASR) model. Developed by Victor Upmeet, it provides fast transcription with word-level timestamps and speaker diarization. This model builds upon the capabilities of Whisper, which was originally created by OpenAI, and incorporates optimizations from the WhisperX project for improved performance.
-
FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Project mention: CosyVoice 2025 Complete Guide: The Ultimate Multi-lingual Text-to-Speech Solution | dev.to | 2025-12-15FunASR - Automatic Speech Recognition
-
PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
-
Star the Speech Brain repository ⭐
-
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
SpeechRecognition
Speech recognition module for Python, supporting several engines and APIs, online and offline.
-
-
voice-pro
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.
Project mention: Show HN: Likes/day as fake profile → built my own dating app in 100 days | news.ycombinator.com | 2025-12-16 -
Project mention: CosyVoice 2025 Complete Guide: The Ultimate Multi-lingual Text-to-Speech Solution | dev.to | 2025-12-15
WeNet - Speech Recognition Toolkit
-
Project mention: Show HN: Shoggoth Mini – A weird tentacle robot powered by GPT-4o and RL | news.ycombinator.com | 2025-07-15
> also, "GPT-4o continuously listens to speech through the audio stream," is going to be problematic
Seems like openWakeWord or porcupine could be able to solve by adding a layer for wake word detection before sending the prompt off.
I wonder if latency would be any better with a local model cached in a 16GB or 24GB graphics card. It would have to be a quantized/distilled model, but maybe performance would still be acceptable.
https://github.com/dscripka/openWakeWord
https://github.com/Picovoice/porcupine
-
Well, no ... For a start any "AI" course 20 years ago probably wouldn't have even mentioned neural nets, and certainly not as a mainstream technique.
A 20yr old "AI" curriculum would have looked more like the 3rd edition of Russel & Norvig's "Artificial Intelligence - A Modern Approach".
https://github.com/yanshengjia/ml-road/blob/master/resources...
Karpathy's videos aren't an AI (except in modern sense of AI=LLMs) course, or even a machine learning course, or even a neural network course for that matter (despite the title) - it's really just "From Zero to LLMs".
-
distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
-
-
-
whisper-standalone-win
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
Project mention: Show HN: Python Audio Transcription: Convert Speech to Text Locally | news.ycombinator.com | 2025-09-22I like this version of Whisper which has diarization built in: https://github.com/Purfview/whisper-standalone-win
-
whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
-
lip-reading-deeplearning
:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures
-
-
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python speech-recognition discussion
Python speech-recognition related posts
-
A beginner's guide to the Whisperx-A40-Large model by Victor-Upmeet on Replicate
-
CosyVoice 2025 Complete Guide: The Ultimate Multi-lingual Text-to-Speech Solution
-
Video to Text AI: The [2025 Guide] to Unlocking Revenue from Content
-
Making AI Models Faster, Cheaper, and Greener — Here’s How
-
5 must know open-source repositories to build cool AI apps
-
Kitten TTS: 25MB CPU-Only, Open-Source Voice Model
-
Ask HN: What Speaker Diarization tools should I look into?
-
A note from our sponsor - Stream
getstream.io | 5 Jan 2026
Index
What are some of the best open-source speech-recognition projects in Python? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | transformers | 154,507 |
| 2 | faster-whisper | 19,699 |
| 3 | whisperX | 19,392 |
| 4 | FunASR | 14,261 |
| 5 | PaddleSpeech | 12,477 |
| 6 | speechbrain | 11,003 |
| 7 | espnet | 9,666 |
| 8 | SpeechRecognition | 8,919 |
| 9 | SenseVoice | 7,280 |
| 10 | voice-pro | 5,220 |
| 11 | wenet | 4,983 |
| 12 | Porcupine | 4,573 |
| 13 | ml-road | 4,562 |
| 14 | distil-whisper | 4,010 |
| 15 | whisper-asr-webservice | 3,090 |
| 16 | lingvo | 2,855 |
| 17 | whisper-standalone-win | 2,768 |
| 18 | whisper-timestamped | 2,713 |
| 19 | lip-reading-deeplearning | 1,892 |
| 20 | kalliope | 1,753 |
| 21 | Dragonfire | 1,398 |
| 22 | SpeechT5 | 1,398 |
| 23 | SALMONN | 1,373 |