Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

@syntaxsdev
Copy link

@syntaxsdev syntaxsdev commented Jul 14, 2025

Description

closes #320

This adds support for async jobs. What this does is allows you to submit an audio for processing and receive a job_id which you can check back for later.

  • Job data is stored in a sqlite DB
  • Transcription results ARE NOT stored in sqlite DB, rather in a set of jobs
  • Processing in the background in order using a queue
  • Data is temporarily stored for a period of time

Jobs are cleaned up on two occasions:
https://github.com/syntaxsdev/whisper-asr-webservice/blob/3678d2aff95aff673fd4496e5aa8c2b11c1a6ae6/app/config.py#L56-L59

# How long to keep a batch process after its value been read - Default is 30 minutes
JOB_CLEANUP_AFTER_READ = int(os.getenv("JOB_CLEANUP_AFTER_READ", 1800))
# How long to keep a batch process after its value been abandoned (not read) - Default is 24 hours
JOB_CLEANUP_ABANDONED = int(os.getenv("JOB_CLEANUP_ABANDONED", 86400))

This means, once a job is processed, you have

  • 24 hours (default) to read the value or it will be considered abandoned and deleted
  • 30 minutes (default) after you read it before it is deleted

Usage

POST asr/ - just set the async_job param to true
Example response:
image

to retrieve:
GET asr/{job_id} (new endpoint)
Example response:
image

In this particular example, I used diarization on WhisperX model as well.

If a failure occurs, it will display the status as failed and also be cleaned at the JOB_CLEANUP_AFTER_READ period.

Other Notes

I've built it with support eventually to expand to async batch jobs, where you can upload multiple files at once (or multiple files into a job) and then eventually kick off the job, which is why the output is structured as such.

I think a separate PR would be warranted for that.

Testing

  • Tested both locally and containerized (CPU/GPU).
  • Verified works in Kubernetes (OpenShift)
  • Works with 921MB (nearly 1GB) audio file, tested on GPU - took 2 minutes
image

Test containers:
docker.io/syntaxsdev/whisper-asr-webservice:latest
docker.io/syntaxsdev/whisper-asr-webservice:latest-gpu

@syntaxsdev
Copy link
Author

bump @ahmetoner :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Support async jobs and later retrieval

1 participant

Morty Proxy This is a proxified and sanitized view of the page, visit original site.