GPT-QModel v4.2.0

Notable Changes

Add Qwen3-Next by @Qubitium and @LRL-ModelCloud in #1787
Add Apertus support by @LRL-ModelCloud in #1767
Add Kimi k2 support by @LRL-ModelCloud in #1768
Add Klear support by @LRL-ModelCloud in #1769
Add FastLLM support by @LRL-ModelCloud in #1771
Add Nemotron H support by @LRL-ModelCloud in #1773
Add fail_safe option by @LRL-ModelCloud in #1775
Use threading lock to protect unsafe tensor moves in multi-gpu by @Qubitium in #1778
Avoid building experimental extensions to reduce wheel size by @Qubitium in #1763

Fix LlavaQwen2GPTQ by @LRL-ModelCloud in #1772
Fix Q.to on multi-gpu gptq when proceeding fast and has many experts and gpus by @avtc in #1774
Bump actions/setup-python from 5 to 6 in the github-actions group by @dependabot[bot] in #1758
[CI] fix release jobs were skipped by @CSY-ModelCloud in #1759
ignore compile warns about var declared but not used by @Qubitium in #1760
allow prebuilt wheel path to be customized via env by @Qubitium in #1761
add build toggles for all cpp kernels by @Qubitium in #1764
fix multi gpu inference by @LRL-ModelCloud in #1762
[CI] reduce wheel download size by @CSY-ModelCloud in #1765
start 4.2.0-dev cycle by @Qubitium in #1766
fix klear by @LRL-ModelCloud in #1770
FIX transformers >= 4.56.1 force changed torch.default_dtype by @Qubitium in #1779
fix multi gpu fail_safe by @LRL-ModelCloud in #1780
fix device instance by @LRL-ModelCloud in #1783
prepare for 4.2 release by @Qubitium in #1785

Full Changelog: v4.1.0...v4.2.0