kv-cache-quantization

Star

Here are 6 public repositories matching this topic...

Zefan-Cai / Awesome-LLM-KV-Cache

Star

Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

kv-cache llm kv-cache-quantization kv-cache-compression

Updated Mar 3, 2025

shadowpa0327 / Palu

Star

[ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projection

mla kv-cache-quantization deepseek kv-cache-compression

Updated Feb 20, 2025
Python

Self-hosted auto clustering AI agent OS for low cost consumer hardware like the computer you have, an Orange or Raspberry Pi or a Mac etc. Desktop shell, app store, agent deployment, distributed compute cluster. Memory by taOSmd.

raspberry-pi distributed-computing self-hosted orange-pi ai-agents ai-platform agent-framework apple-silicon llm vllm local-llm llm-inference kv-cache-quantization rockchip-npu turboquant

Updated May 6, 2026
Python

suraj-ranganath / kv-quant-longhorizon

Star

Empirical study of KV-cache quantization in self-forcing video generation

quantization kv-cache-quantization autoregressive-video-generation

Updated Mar 14, 2026
Python

Henvezz95 / VAR-Compressor

Star

W4A4 and INT8 KV-cache quantization for Infinity VAR models. Optimized for high-fidelity generative AI deployment on edge GPUs (e.g. NVIDIA Jetson).

computer-vision pytorch gpu-acceleration quantization model-compression nvidia-jetson inference-optimization edge-ai on-device-ml weight-quantization post-training-quantization autoregressive-models generative-ai kv-cache-quantization activation-quantization visual-autoregressive-model svdquant infinity-var

Updated Apr 29, 2026
Python

thupalo / llama-on-dgx-spark

Star

Deploy Nemotron 3 Nano 30B with 1M context window on NVIDIA DGX Spark using llama.cpp (Blackwell sm_121, Q4_0 KV cache quantization)

cuda inference aarch64 mamba mixture-of-experts blackwell long-context llama-cpp local-llm gguf kv-cache-quantization nemotron nvidia-dgx-spark 1m-context-window

Updated Mar 22, 2026
Shell

Improve this page

Add a description, image, and links to the kv-cache-quantization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the kv-cache-quantization topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv-cache-quantization

Here are 6 public repositories matching this topic...

Zefan-Cai / Awesome-LLM-KV-Cache

shadowpa0327 / Palu

jaylfc / tinyagentos

suraj-ranganath / kv-quant-longhorizon

Henvezz95 / VAR-Compressor

thupalo / llama-on-dgx-spark

Improve this page

Add this topic to your repo

Search code, repositories, users, issues, pull requests...

kv-cache-quantization

Here are 6 public repositories matching this topic...

Zefan-Cai / Awesome-LLM-KV-Cache

shadowpa0327 / Palu

jaylfc / tinyagentos

suraj-ranganath / kv-quant-longhorizon

Henvezz95 / VAR-Compressor

thupalo / llama-on-dgx-spark

Improve this page

Add this topic to your repo