Python speech-recognition

Open-source Python projects categorized as speech-recognition

Top 23 Python speech-recognition Projects

speech-recognition
  1. transformers

    🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

    Project mention: Detecting AI Slop: Techniques & Red Flags | dev.to | 2025-12-28

    HuggingFace Transformers - Library for building custom detectors

  2. Stream

    Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.

    Stream logo
  3. faster-whisper

    Faster Whisper transcription with CTranslate2

  4. whisperX

    WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

    Project mention: A beginner's guide to the Whisperx-A40-Large model by Victor-Upmeet on Replicate | dev.to | 2026-01-04

    The whisperx-a40-large model is an accelerated version of the popular Whisper automatic speech recognition (ASR) model. Developed by Victor Upmeet, it provides fast transcription with word-level timestamps and speaker diarization. This model builds upon the capabilities of Whisper, which was originally created by OpenAI, and incorporates optimizations from the WhisperX project for improved performance.

  5. FunASR

    A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

    Project mention: CosyVoice 2025 Complete Guide: The Ultimate Multi-lingual Text-to-Speech Solution | dev.to | 2025-12-15

    FunASR - Automatic Speech Recognition

  6. PaddleSpeech

    Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

  7. speechbrain

    A PyTorch-based Speech Toolkit

    Project mention: 5 must know open-source repositories to build cool AI apps | dev.to | 2025-10-29

    Star the Speech Brain repository ⭐

  8. espnet

    End-to-End Speech Processing Toolkit

  9. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  10. SpeechRecognition

    Speech recognition module for Python, supporting several engines and APIs, online and offline.

  11. SenseVoice

    Multilingual Voice Understanding Model

  12. voice-pro

    Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.

    Project mention: Show HN: Likes/day as fake profile → built my own dating app in 100 days | news.ycombinator.com | 2025-12-16
  13. wenet

    Production First and Production Ready End-to-End Speech Recognition Toolkit

    Project mention: CosyVoice 2025 Complete Guide: The Ultimate Multi-lingual Text-to-Speech Solution | dev.to | 2025-12-15

    WeNet - Speech Recognition Toolkit

  14. Porcupine  

    On-device wake word detection powered by deep learning

    Project mention: Show HN: Shoggoth Mini – A weird tentacle robot powered by GPT-4o and RL | news.ycombinator.com | 2025-07-15

    > also, "GPT-4o continuously listens to speech through the audio stream," is going to be problematic

    Seems like openWakeWord or porcupine could be able to solve by adding a layer for wake word detection before sending the prompt off.

    I wonder if latency would be any better with a local model cached in a 16GB or 24GB graphics card. It would have to be a quantized/distilled model, but maybe performance would still be acceptable.

    https://github.com/dscripka/openWakeWord

    https://github.com/Picovoice/porcupine

  15. ml-road

    Machine Learning and Agentic AI Resources, Practice and Research

    Project mention: Neural Networks: Zero to Hero | news.ycombinator.com | 2026-01-04

    Well, no ... For a start any "AI" course 20 years ago probably wouldn't have even mentioned neural nets, and certainly not as a mainstream technique.

    A 20yr old "AI" curriculum would have looked more like the 3rd edition of Russel & Norvig's "Artificial Intelligence - A Modern Approach".

    https://github.com/yanshengjia/ml-road/blob/master/resources...

    Karpathy's videos aren't an AI (except in modern sense of AI=LLMs) course, or even a machine learning course, or even a neural network course for that matter (despite the title) - it's really just "From Zero to LLMs".

  16. distil-whisper

    Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

  17. whisper-asr-webservice

    OpenAI Whisper ASR Webservice API

  18. lingvo

    Lingvo

  19. whisper-standalone-win

    Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.

    Project mention: Show HN: Python Audio Transcription: Convert Speech to Text Locally | news.ycombinator.com | 2025-09-22

    I like this version of Whisper which has diarization built in: https://github.com/Purfview/whisper-standalone-win

  20. whisper-timestamped

    Multilingual Automatic Speech Recognition with word-level timestamps and confidence

  21. lip-reading-deeplearning

    :unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures

  22. kalliope

    Kalliope is a framework that will help you to create your own personal assistant.

  23. Dragonfire

    the open-source virtual assistant for Ubuntu based Linux distributions

  24. SpeechT5

    Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

  25. SALMONN

    SALMONN family: A suite of advanced multi-modal LLMs

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python speech-recognition discussion

Log in or Post with

Python speech-recognition related posts

  • A beginner's guide to the Whisperx-A40-Large model by Victor-Upmeet on Replicate

    1 project | dev.to | 4 Jan 2026
  • CosyVoice 2025 Complete Guide: The Ultimate Multi-lingual Text-to-Speech Solution

    5 projects | dev.to | 15 Dec 2025
  • Video to Text AI: The [2025 Guide] to Unlocking Revenue from Content

    1 project | dev.to | 10 Dec 2025
  • Making AI Models Faster, Cheaper, and Greener — Here’s How

    5 projects | dev.to | 3 Nov 2025
  • 5 must know open-source repositories to build cool AI apps

    6 projects | dev.to | 29 Oct 2025
  • Kitten TTS: 25MB CPU-Only, Open-Source Voice Model

    19 projects | news.ycombinator.com | 5 Aug 2025
  • Ask HN: What Speaker Diarization tools should I look into?

    1 project | news.ycombinator.com | 23 Jul 2025
  • A note from our sponsor - Stream
    getstream.io | 5 Jan 2026
    Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure. Learn more →

Index

What are some of the best open-source speech-recognition projects in Python? This list will help you:

# Project Stars
1 transformers 154,507
2 faster-whisper 19,699
3 whisperX 19,392
4 FunASR 14,261
5 PaddleSpeech 12,477
6 speechbrain 11,003
7 espnet 9,666
8 SpeechRecognition 8,919
9 SenseVoice 7,280
10 voice-pro 5,220
11 wenet 4,983
12 Porcupine   4,573
13 ml-road 4,562
14 distil-whisper 4,010
15 whisper-asr-webservice 3,090
16 lingvo 2,855
17 whisper-standalone-win 2,768
18 whisper-timestamped 2,713
19 lip-reading-deeplearning 1,892
20 kalliope 1,753
21 Dragonfire 1,398
22 SpeechT5 1,398
23 SALMONN 1,373

Sponsored
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video.
Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
getstream.io

Did you know that Python is
the 2nd most popular programming language
based on number of references?

Morty Proxy This is a proxified and sanitized view of the page, visit original site.