Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

rnabioco/sracha-rs

Open more actions menu

Repository files navigation

🌶️ sracha 🌶️

Anaconda-Server Badge Anaconda-Server Badge

Fast SRA downloader and FASTQ converter, written in pure Rust.

sracha demo

Features

  • Fast -- 4-11x faster than fasterq-dump on typical SRA files
  • One command -- download, convert to FASTQ, and compress
  • Batch input -- accessions, BioProjects (PRJNA), studies (SRP), or a file via --accession-list
  • gzip or zstd output -- parallel compression, or plain FASTQ
  • FASTA output -- --fasta drops quality scores
  • SRA and SRA-lite -- full or simplified quality scores
  • Split modes -- split-3, split-files, split-spot, interleaved
  • Resumable downloads -- picks up where it left off
  • Stdout streaming -- -Z pipes FASTQ straight into downstream tools
  • Integrity checks -- MD5 verification on download and decode
  • Platform support -- Illumina, BGISEQ/DNBSEQ, Element, Ultima, PacBio, Nanopore (legacy 454 and Ion Torrent are not supported)
  • Single static binary -- no Python, no C dependencies

Quick start

# Download, convert, and compress
sracha get SRR28588231

# Download all runs from a BioProject
sracha get PRJNA675068

# Batch download from an accession list
sracha get --accession-list SRR_Acc_List.txt

# Just download
sracha fetch SRR28588231

# Convert a local .sra file
sracha fastq SRR28588231.sra

# Show accession info
sracha info SRR28588231

# Validate a downloaded file
sracha validate SRR28588231.sra

Benchmarks

Local decode (SRA file on disk → FASTQ)

Uncompressed output, measured with hyperfine.

File Size sracha fasterq-dump fastq-dump Speedup vs fasterq-dump
SRR28588231 23 MiB 0.17 s 1.86 s 2.09 s 10.9x
SRR2584863 288 MiB 1.51 s 5.80 s 13.30 s 3.8x
ERR1018173 1.94 GiB 9.40 s 34.35 s -- 3.7x

sracha produces gzipped FASTQ by default (level 1, ~1.4× the uncompressed time on small files thanks to parallel block compression), so the integrated pipeline (sracha get) writes ready-to-use .fastq.gz without a separate gzip step.

Full hyperfine output

SRR28588231 (23 MiB, 66K spots, Illumina paired)

Command Mean [ms] Min [ms] Max [ms] Relative
sracha 170.9 ± 1.8 168.2 175.4 1.00
fasterq-dump 1856.4 ± 14.2 1838.3 1871.6 10.86 ± 0.14
fastq-dump 2090.5 ± 33.3 2052.5 2125.0 12.23 ± 0.23

SRR2584863 (288 MiB, Illumina paired)

Command Mean [s] Min [s] Max [s] Relative
sracha 1.512 ± 0.018 1.496 1.532 1.00
fasterq-dump 5.799 ± 0.130 5.667 5.927 3.83 ± 0.10
fastq-dump 13.297 ± 0.157 13.192 13.478 8.79 ± 0.15

ERR1018173 (1.94 GiB, 15.6M spots, Illumina paired, single run)

Command Time [s]
sracha 9.40
fasterq-dump 34.35

sracha gzip overhead (SRR28588231, default --gzip-level 1)

Command Mean [ms] Min [ms] Max [ms] Relative
sracha (no compression) 172.1 ± 5.6 165.1 185.6 1.00
sracha (gzip) 239.5 ± 5.9 230.9 249.4 1.39 ± 0.06

Benchmarks run with sracha v0.3.5, sra-tools v3.4.1, on Linux (8 CPUs). Install the reference toolkit with pixi run install-sratools and reproduce with validation/benchmark.sh.

Installation

Install via Bioconda:

pixi add --channel bioconda sracha

Or download pre-built binaries from the releases page, or install from source:

cargo install --git https://github.com/rnabioco/sracha-rs sracha

Documentation

Full CLI reference and usage guide: https://rnabioco.github.io/sracha-rs/

Acknowledgments

sracha builds on the Sequence Read Archive, maintained by the National Center for Biotechnology Information at the National Library of Medicine. The SRA and its toolchain are public-domain software developed by U.S. government employees — our tax dollars at work. Special thanks to Kenneth Durbrow (@durbrow) and the SRA Toolkit team for building and maintaining the infrastructure that makes projects like this possible.

This project wouldn't exist without NCBI's open infrastructure: the VDB/KAR format, the SDL locate API, EUtils, and public S3 hosting of sequencing data. sracha aims to make it easier for the community to build on that foundation.

License

MIT

Morty Proxy This is a proxified and sanitized view of the page, visit original site.