GPTQModel v2.1.0

What's Changed

✨ New QQQ quantization method and inference support!
✨ New Google Gemma 3 day-zero model support.
✨ New Alibaba Ovis 2 VL model support.
✨ New AMD Instella day-zero model support.
✨ New GSM8K Platinum and MMLU-Pro benchmarking suppport.
✨ Peft Lora training with GPTQModel is now 30%+ faster on all gpu and IPEX devices.
✨ Auto detect MoE modules not activated during quantization due to insufficient calibration data.
✨ ROCm setup.py compat fixes.
✨ Optimum and Peft compat fixes.
✨ Fixed Peft bfloat16 training.

auto enable flash_attn only when flash-attn was installed by @CSY-ModelCloud in #1372
Fix rocm compat by @Qubitium in #1373
fix unnecessary mkdir by @CSY-ModelCloud in #1374
add test_kernel_output_xpu.py by @CSY-ModelCloud in #1382
clean test_kernel_output_xpu.py by @CSY-ModelCloud in #1383
tremove xpu support of triton kernel by @Qubitium in #1384
[MODEL] Add instella support by @LRL-ModelCloud in #1385
Fix optimum/peft trainer integration by @CSY-ModelCloud in #1381
rename peft test file by @CSY-ModelCloud in #1387
[CI] fix wandb was not installed & update test_olora_finetuning_xpu.py by @CSY-ModelCloud in #1388
Add lm-eval GSM8k Platinum by @Qubitium in #1394
Remove cuda kernel by @Qubitium in #1396
fix exllama kernels not compiled by @Qubitium in #1397
update tests by @Qubitium in #1398
make the kernel output validation more robust by @Qubitium in #1399
speed up ci by @Qubitium in #1400
add fwd counter by @yuchiwang in #1389
allow triton and ipex to inherit torch kernel and use torch for train… by @Qubitium in #1401
fix skip moe modules when fwd count is 0 by @Qubitium in #1404
fix ipex linear post init for finetune by @jiqing-feng in #1406
fix optimum compat by @Qubitium in #1408
[Feature] Add mmlupro API by @CL-ModelCloud in #1405
add training callback by @CSY-ModelCloud in #1409
Fix bf16 training by @Qubitium in #1410
fix bf16 forward for triton by @Qubitium in #1411
Add QQQ by @Qubitium in #1402
make IPEX or any kernel that uses Torch for Training to auto switch v… by @Qubitium in #1412
[CI] xpu inference test by @CL-ModelCloud in #1380
[FIX] qqq with eora by @ZX-ModelCloud in #1415
[FIX] device error by @ZX-ModelCloud in #1417
make quant linear expose internal buffers by @Qubitium in #1418
Fix bfloat16 kernels by @Qubitium in #1420
fix qqq bfloat16 forward by @Qubitium in #1423
Fix ci10 by @Qubitium in #1424
fix marlin bf16 compat by @Qubitium in #1427
[CI] no need reinstall requirements by @CSY-ModelCloud in #1426
[FIX] dynamic save error by @ZX-ModelCloud in #1428
[FIX] super().post_init() calling order by @ZX-ModelCloud in #1431
fix bitblas choose IPEX in cuda env by @CSY-ModelCloud in #1432
Fix exllama is not packable by @Qubitium in #1433
disable exllama for training by @Qubitium in #1435
remove TritonV2QuantLinear for xpu test by @CSY-ModelCloud in #1436
[MODEL] add gemma3 support by @LRL-ModelCloud in #1434
fix the error when downloading models using modelscope by @mushenL in #1437
Add QQQ Rotation by @ZX-ModelCloud in #1425
fix no init.py by @CSY-ModelCloud in #1438
Fix hardmard import by @Qubitium in #1441
Eora final by @nbasyl in #1440
triton is not validated for ipex by @Qubitium in #1445
Fix exllama adapter by @Qubitium in #1446
fix rocm compile by @Qubitium in #1447
[FIX] Correctly obtain the submodule's device by @ZX-ModelCloud in #1448
fix rocm not compatible with exllama v2 and eora kernel by @Qubitium in #1449
revert overflow code by @Qubitium in #1450
add kernel dtype support and add full float15 vs bfloat16 kernel testing by @Qubitium in #1452
[MODEL] add Ovis2 support and bug fix by @Fusionplay in #1454
add unit test for ovis2 by @CSY-ModelCloud in #1456

New Contributors

@yuchiwang made their first contribution in #1389
@mushenL made their first contribution in #1437
@nbasyl made their first contribution in #1440
@Fusionplay made their first contribution in #1454

Full Changelog: v2.0.0...v2.1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPTQModel v2.1.0

What's Changed

New Contributors

Contributors

Uh oh!

Search code, repositories, users, issues, pull requests...

GPTQModel v2.1.0

What's Changed

New Contributors

Contributors

Uh oh!