aivrar / vllm-windows-build Star 12 Code Issues Pull requests Native Windows build of vLLM 0.19.1 — no WSL, no Docker. Pre-built wheels + 34-file Windows patch + Multi-TurboQuant KV cache compression (6 methods, 2x cache capacity). PyTorch 2.10 + CUDA 12.6 + Triton + Flash-Attention 2. windows gpu cuda pytorch nvidia triton msvc quantization kv-cache awq llm llm-serving vllm llm-inference flash-attention qwen kv-cache-compression turboquant vllm-windows multi-turboquant Updated Apr 26, 2026 Python
palatalised-chancellorsville108 / turboquant-pytorch Star 0 Code Issues Pull requests Accelerate LLM KV cache compression with a PyTorch TurboQuant implementation for efficient, high-quality vector quantization. windows machine-learning deep-learning cpp retrieval gpu cuda transformers pytorch triton attention mlx libtorch kv-cache apple-silicon llm-inference kv-cache-compression vector-compression multi-turboquant Updated May 6, 2026