Comparing changes

* fix not defined error * fix test_perplexity fail * modify dataset filter text length * modify assert the difference of ppl * modify dataset filter with text length

* remove Backend.CUDA and Backend.CUDA_OLD * fix unit test * remove cuda_64/ and cuda_256/

* Support QBits kernel for CPU device Signed-off-by: Cheng Penghui <penghui.cheng@intel.com> * fix merge * format * fix merge * rename to meet with latest main style * rename to meet with latest main style * fix doc * revert commented codes * add warning for fallback to cpu * remove unneeded var * fix merge * get gpu from curl * update url & use matrix * revert to main * update codes with pr comments * no 2 bit * set min to 1.4.2 * fix name * add test * remove cpu check, model.device is CPU, so it cause wrong type check there * remove cpu check, model.device is CPU, so it cause wrong type check there * temp disable cuda check * add cpu check back * check module type like main * fix torch_dtype wrong which caused qbits not work * check bits support with BITS_DTYPE_MAPPING * add qbits test * add qbit test to ci * remove for now * delete test_qbits_kernel.py, it can't pass all 4 bit tests * remove cpu check again.. not sure what it is * add qbits in format tests * move test_qbits to test_cpu * no need container * setup python * update cuda check * set python to 3.10 * fix check * update runner * update runner * disable download other run's artifact * set --durations=0 * quant_type removed from main * quant_type removed * override device=cpu for qbits qbits must be explicit and we do not auto switch to qbits when device=cpu. we do the reverse, and force device=cpu and backend set to qbits * Update base.py * Update qlinear_qbits.py * qbits supports 2, 3, 4, 8 bits * Update qlinear_qbits.py * reverse/rename asym into sym * ruff * rename * rename * load qbits only as needed * cleanup * cleanup * fix device override for qbits * cleanup * cuda has been removed * format * fix check condition * fix qbits RuntimeError * fix qbits RuntimeError * remove todo * add protobuf in req & remove buggy download artifact with runid: actions/download-artifact#295 * ruff --------- Signed-off-by: Cheng Penghui <penghui.cheng@intel.com> Co-authored-by: Cheng Penghui <penghui.cheng@intel.com> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

* revert comment * remove 8 bits test

* add 2 & 3 bits * update SUPPORT_BITS

* fix not defined error * fix test_perplexity fail * modify dataset filter text length * modify assert the difference of ppl * modify dataset filter with text length * fix use wrong tokenizer get dataset * simplify code

* Check QuantLinear Device * cleanup * REFRACTOR check_cuda by introducing SUPPORTED_DEVICE into BaseQuantLinear * make device type cuda/cpu an enum * cleanup * cleanup

* add quant support for qbits * test quant with qbits * set real sym back to quantize_config * Update qlinear_qbits.py --------- Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comparing changes

Open a pull request

Commits on Jul 4, 2024

Commits on Jul 5, 2024

This comparison is taking too long to generate.

Uh oh!

Search code, repositories, users, issues, pull requests...

Comparing changes

Open a pull request

Uh oh!

Uh oh!

Commits on Jul 4, 2024

Commits on Jul 5, 2024

This comparison is taking too long to generate.

Uh oh!