GPT-QModel v4.2.0
Notable Changes
- Add Qwen3-Next by @Qubitium and @LRL-ModelCloud in #1787
- Add Apertus support by @LRL-ModelCloud in #1767
- Add Kimi k2 support by @LRL-ModelCloud in #1768
- Add Klear support by @LRL-ModelCloud in #1769
- Add FastLLM support by @LRL-ModelCloud in #1771
- Add Nemotron H support by @LRL-ModelCloud in #1773
- Add
fail_safe
option by @LRL-ModelCloud in #1775 - Use threading lock to protect unsafe tensor moves in multi-gpu by @Qubitium in #1778
- Avoid building experimental extensions to reduce wheel size by @Qubitium in #1763
What's Changed
- Fix LlavaQwen2GPTQ by @LRL-ModelCloud in #1772
- Fix Q.to on multi-gpu gptq when proceeding fast and has many experts and gpus by @avtc in #1774
- Bump actions/setup-python from 5 to 6 in the github-actions group by @dependabot[bot] in #1758
- [CI] fix release jobs were skipped by @CSY-ModelCloud in #1759
- ignore compile warns about var declared but not used by @Qubitium in #1760
- allow prebuilt wheel path to be customized via env by @Qubitium in #1761
- add build toggles for all cpp kernels by @Qubitium in #1764
- fix multi gpu inference by @LRL-ModelCloud in #1762
- [CI] reduce wheel download size by @CSY-ModelCloud in #1765
- start 4.2.0-dev cycle by @Qubitium in #1766
- fix klear by @LRL-ModelCloud in #1770
- FIX transformers >= 4.56.1 force changed
torch.default_dtype
by @Qubitium in #1779 - fix multi gpu fail_safe by @LRL-ModelCloud in #1780
- fix device instance by @LRL-ModelCloud in #1783
- prepare for 4.2 release by @Qubitium in #1785
Full Changelog: v4.1.0...v4.2.0