speculative-decoding

Star

Here are 121 public repositories matching this topic...

SafeAILab / EAGLE

Star

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).

large-language-models llm-inference speculative-decoding

Updated Feb 20, 2026
Python

intel / intel-extension-for-transformers

Star

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

retrieval chatbot rag habana large-language-model chatpdf llm-inference 4-bits speculative-decoding llm-cpu streamingllm intel-optimized-llamacpp neural-chat neural-chat-7b autoround gaudi3

Updated Oct 8, 2024
Python

Luce-Org / lucebox-hub

Star

Lucebox optimization hub: hand-tuned LLM inference, built for specific consumer hardware.

kernel cuda cuda-kernels nvidia-cuda luce rtx3090 llama-cpp local-ai qwen speculative-decoding dflash megakernel speculative-prefill pflash lucebox

Updated May 8, 2026
C++

aphrodite-engine / aphrodite-engine

Star

Large-scale LLM inference engine

machine-learning cuda intel api-rest lora rocm inference-engine tpu inferentia speculative-decoding

Updated May 8, 2026
C++

Tencent / AngelSlim

Star

Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.

audio eagle quantization diffusion vlm llm qwen speculative-decoding llm-compression hunyuan deepseek fp4 dflash

Updated May 7, 2026
Python

Infini-AI-Lab / Sequoia

Star

scalable and robust tree-based speculative decoding algorithm

efficiency inference llm speculative-decoding

Updated Jan 28, 2025
Python

facebookresearch / LayerSkip

Star

Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024

optimization transformers early-exit llm speculative-decoding layer-drop

Updated Apr 13, 2026
Python

Infini-AI-Lab / TriForce

Star

[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

acceleration efficiency inference llm long-context llm-inference speculative-decoding

Updated Aug 31, 2024
Python

youssofal / MTPLX

Star

Native MTP Speculative Decoding On Apple Silicon | 2x - 2.5x decode TPS increase at temp 0.6 | MLX-native, OpenAI API/Anthropic-compatible serving, no external drafter.

metal mtp mlx inference-engine apple-silicon local-ai qwen speculative-decoding speculative-sampling openai-compatible qwen3-next anthropic-compatible native-mtp mtplx

Updated May 9, 2026
Python

FasterDecoding / REST

Star

REST: Retrieval-Based Speculative Decoding, NAACL 2024

retrieval llm-inference speculative-decoding

Updated Mar 5, 2026
C

psmarter / mini-infer

Star

LLM inference engine from scratch — paged KV cache, continuous batching, chunked prefill, prefix caching, speculative decoding, CUDA graph, tensor parallelism, OpenAI-compatible serving

machine-learning cuda inference pytorch transformer triton moe quantization language-model inference-engine kv-cache tensor-parallelism llm speculative-decoding pagedattention continuous-batching

Updated Apr 24, 2026
Python

Avarok-Cybersecurity / atlas

Star

Pure Rust Inference Engine

rust cuda transformers ssm mamba dgx openai-api llm-inference speculative-decoding gb10 nvfp4 dgx-spark

Updated May 8, 2026
Rust

humanrouter / ddtree-mlx

Star

Tree-based speculative decoding for Apple Silicon (MLX). ~10-15% faster than DFlash on code, ~1.5x over autoregressive. First MLX port with custom Metal kernels for hybrid model support.

inference mlx apple-silicon llm speculative-decoding

Updated Apr 15, 2026
Python

Infini-AI-Lab / UMbreLLa

Star

LLM Inference on consumer devices

offloading llm-inference speculative-decoding

Updated Mar 17, 2025
Python

bigai-nlco / TokenSwift

Star

[ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation

inference transformer llms llm-serving llm-inference qwen speculative-decoding deepseek

Updated May 19, 2025
Python

romsto / Speculative-Decoding

Star

Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.

fast-inference llm llm-inference speculative-decoding llm-optimization

Updated Dec 2, 2024
Python

kssteven418 / BigLittleDecoder

Star

[NeurIPS'23] Speculative Decoding with Big Little Decoder

decoding efficient-inference speculative-execution fast-inference llm speculative-decoding

Updated Feb 6, 2024
Python

AtomicBot-ai / atomic-llama-cpp-turboquant

Star

llama.cpp fork with TurboQuant WHT-rotated KV cache & weight compression + Gemma 4 MTP speculative decoding for ~30-50% throughput gains

Updated May 8, 2026
C++

Sandermage / genesis-vllm-patches

Star

vLLM patcher for Qwen3.6 on consumer NVIDIA — Qwen3.6-35B-A3B-FP8 (192 tok/s, +68% over stock) + Qwen3.6-27B-int4-AutoRound + 256K context. 126 patches: TurboQuant k8v4 KV, MTP/DFlash spec-decode, FULL cudagraph, hybrid GDN streaming, structured boot summary, one-command installer, 1958 tests. v7.72.2.

Updated May 5, 2026
Python

hemingkx / SWIFT

Star

[ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration

speculative-decoding

Updated Feb 21, 2025
Python

Improve this page

Add a description, image, and links to the speculative-decoding topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the speculative-decoding topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speculative-decoding

Here are 121 public repositories matching this topic...

SafeAILab / EAGLE

intel / intel-extension-for-transformers

Luce-Org / lucebox-hub

aphrodite-engine / aphrodite-engine

Tencent / AngelSlim

Infini-AI-Lab / Sequoia

facebookresearch / LayerSkip

Infini-AI-Lab / TriForce

youssofal / MTPLX

FasterDecoding / REST

psmarter / mini-infer

Avarok-Cybersecurity / atlas

humanrouter / ddtree-mlx

Infini-AI-Lab / UMbreLLa

bigai-nlco / TokenSwift

romsto / Speculative-Decoding

kssteven418 / BigLittleDecoder

AtomicBot-ai / atomic-llama-cpp-turboquant

Sandermage / genesis-vllm-patches

hemingkx / SWIFT

Improve this page

Add this topic to your repo

Search code, repositories, users, issues, pull requests...

speculative-decoding

Here are 121 public repositories matching this topic...

SafeAILab / EAGLE

intel / intel-extension-for-transformers

Luce-Org / lucebox-hub

aphrodite-engine / aphrodite-engine

Tencent / AngelSlim

Infini-AI-Lab / Sequoia

facebookresearch / LayerSkip

Infini-AI-Lab / TriForce

youssofal / MTPLX

FasterDecoding / REST

psmarter / mini-infer

Avarok-Cybersecurity / atlas

humanrouter / ddtree-mlx

Infini-AI-Lab / UMbreLLa

bigai-nlco / TokenSwift

romsto / Speculative-Decoding

kssteven418 / BigLittleDecoder

AtomicBot-ai / atomic-llama-cpp-turboquant

Sandermage / genesis-vllm-patches

hemingkx / SWIFT

Improve this page

Add this topic to your repo