GPTQModel v2.2.0
What's Changed
✨ New Qwen 2.5 VL model support. Prelim Qwen 3 model support.
✨ New samples log column during quantization to track module activation in MoE models.
✨ Loss log column now color-coded to highlight modules that are friendly/resistant to quantization.
✨ Progress (per-step) stats during quantization now streamed to log file.
✨ Auto bfloat16 dtype loading for models based on model config.
✨ Fix kernel compile for Pytorch/ROCm.
✨ Slightly faster quantization and auto-resolve some low-level oom issues for smaller vram gpus.
- Enable ipex tests for CPU/XPU by @jiqing-feng in #1460
- test kernel accuracies with more shapes on cuda by @Qubitium in #1461
- Fix rocm flags by @Qubitium in #1467
- use table like logging format by @Qubitium in #1471
- stream process log entries to persistent file by @Qubitium in #1472
- fix some models need trust-remote-code arg by @Qubitium in #1474
- Fix wq dtype by @Qubitium in #1475
- add colors to quant loss column by @Qubitium in #1477
- add prelim qwen3 support by @Qubitium in #1478
- Update eora.py for further optimization by @nbasyl in #1488
- faster cholesky inverse and avoid oom when possible by @Qubitium in #1494
- [MODEL] supports qwen2_5_vl by @ZX-ModelCloud in #1493
Full Changelog: v2.1.0...v2.2.0