Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Bumped llama cpp and updated deprecated functions #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 339 commits into from
Closed
Changes from 1 commit
Commits
Show all changes
339 commits
Select commit Hold shift + click to select a range
53ebcc8
feat(server): Provide ability to dynamically allocate all threads if …
sean-bailey Apr 23, 2024
611781f
ci: Build arm64 wheels. Closes #1342
abetlen Apr 23, 2024
c50d330
chore: Bump version
abetlen Apr 23, 2024
2a9979f
feat: Update llama.cpp
abetlen Apr 25, 2024
de37420
fix(ci): Fix python macos test runners issue
abetlen Apr 25, 2024
266abfc
fix(ci): Fix metal tests as well
abetlen Apr 25, 2024
7f52335
feat: Update llama.cpp
abetlen Apr 26, 2024
fcfea66
fix: pydantic deprecation warning
abetlen Apr 26, 2024
f6ed21f
feat: Allow for possibly non-pooled embeddings (#1380)
iamlemec Apr 26, 2024
173ebc7
fix: Remove duplicate pooling_type definition and add misisng n_vocab…
abetlen Apr 26, 2024
65edc90
chore: Bump version
abetlen Apr 26, 2024
9e7f738
ci: Update dependabot.yml (#1391)
Smartappli Apr 28, 2024
c58b561
ci: Update action versions in build-wheels-metal.yaml (#1390)
Smartappli Apr 28, 2024
e6bbfb8
examples: fix quantize example (#1387)
iyubondyrev Apr 28, 2024
f178636
fix: Functionary bug fixes (#1385)
jeffrey-fong Apr 28, 2024
17bdfc8
chore(deps): bump conda-incubator/setup-miniconda from 2.2.0 to 3.0.4…
dependabot[bot] Apr 28, 2024
27038db
chore(deps): bump actions/cache from 3.3.2 to 4.0.2 (#1398)
dependabot[bot] Apr 28, 2024
79318ba
chore(deps): bump docker/login-action from 2 to 3 (#1399)
dependabot[bot] Apr 28, 2024
7074c4d
chore(deps): bump docker/build-push-action from 4 to 5 (#1400)
dependabot[bot] Apr 28, 2024
c07db99
chore(deps): bump pypa/cibuildwheel from 2.16.5 to 2.17.0 (#1401)
dependabot[bot] Apr 28, 2024
c9b85bf
feat: Update llama.cpp
abetlen Apr 28, 2024
a411612
feat: Add support for str type kv_overrides
abetlen Apr 28, 2024
2355ce2
ci: Add support for pre-built cuda 12.4.1 wheels (#1388)
Smartappli Apr 28, 2024
0c3bc4b
fix(ci): Update generate wheel index script to include cu12.3 and cu1…
abetlen Apr 29, 2024
03c654a
ci(fix): Workflow actions updates and fix arm64 wheels not included i…
Smartappli Apr 30, 2024
32c000f
chore(deps): bump softprops/action-gh-release from 1 to 2 (#1408)
dependabot[bot] Apr 30, 2024
be43018
chore(deps): bump actions/configure-pages from 4 to 5 (#1411)
dependabot[bot] Apr 30, 2024
df2b5b5
chore(deps): bump actions/upload-artifact from 3 to 4 (#1412)
dependabot[bot] Apr 30, 2024
97fb860
feat: Update llama.cpp
abetlen Apr 30, 2024
fe2da09
feat: Generic Chat Formats, Tool Calling, and Huggingface Pull Suppor…
abetlen Apr 30, 2024
26c7876
chore: Bump version
abetlen Apr 30, 2024
d03f15b
fix(ci): Fix bug in use of upload-artifact failing to merge multiple …
abetlen Apr 30, 2024
3489ef0
fix: Ensure image renders before text in chat formats regardless of m…
abetlen Apr 30, 2024
f417cce
chore: Bump version
abetlen Apr 30, 2024
c8cd8c1
docs: Update README to include CUDA 12.4 wheels
abetlen Apr 30, 2024
6332527
fix(ci): Fix build-and-release.yaml (#1413)
Smartappli Apr 30, 2024
8c2b24d
feat: Update llama.cpp
abetlen Apr 30, 2024
22d77ee
feat: Add option to enable `flash_attn` to Lllama params and ModelSet…
abetlen Apr 30, 2024
29b6e9a
fix: wrong parameter for flash attention in pickle __getstate__
abetlen Apr 30, 2024
b14dd98
chore: Bump version
abetlen Apr 30, 2024
26478ab
docs: Update README.md
abetlen Apr 30, 2024
945c62c
docs: Change all examples from interpreter style to script style.
abetlen Apr 30, 2024
3226b3c
fix: UTF-8 handling with grammars (#1415)
jsoma Apr 30, 2024
f116175
fix: Suppress all logs when verbose=False, use hardcoded fileno's to …
abetlen Apr 30, 2024
9286b5c
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
abetlen Apr 30, 2024
946156f
feat: Update llama.cpp
abetlen Apr 30, 2024
4f01c45
fix: Change default verbose value of verbose in image chat format han…
abetlen Apr 30, 2024
31b1d95
feat: Add llama-3-vision-alpha chat format
abetlen May 2, 2024
d75dea1
feat: Update llama.cpp
abetlen May 2, 2024
2117122
chore: Bump version
abetlen May 2, 2024
2138561
fix(server): Propagate `flash_attn` to model load. (#1424)
dthuerck May 3, 2024
0a454be
feat(server): Remove temperature bounds checks for server. Closes #1384
abetlen May 3, 2024
9f7a855
fix: Use memmove to copy str_value kv_override. Closes #1417
abetlen May 3, 2024
f9b7221
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
abetlen May 3, 2024
1f56c64
feat: Implement streaming for Functionary v2 + Bug fixes (#1419)
jeffrey-fong May 4, 2024
e0d7674
fix: detokenization case where first token does not start with a lead…
noamgat May 4, 2024
3e2597e
feat: Update llama.cpp
abetlen May 5, 2024
3666833
feat(ci): Add docker checks and check deps more frequently (#1426)
Smartappli May 5, 2024
0318702
feat(server): Add support for setting root_path. Closes #1420
abetlen May 5, 2024
a50d24e
fix: chat_format log where auto-detected format prints `None` (#1434)
balvisio May 8, 2024
07966b9
docs: update README.md (#1432)
eltociear May 8, 2024
903b28a
fix: adding missing args in create_completion for functionary chat ha…
skalade May 8, 2024
228949c
feat: Update llama.cpp
abetlen May 8, 2024
4a7122d
feat: fill-in-middle support (#1386)
CISC May 8, 2024
9ce5cb3
chore: Bump version
abetlen May 8, 2024
2a39b99
feat: Update llama.cpp
abetlen May 8, 2024
7712263
fix: Make leading bos_token optional for image chat formats, fix nano…
abetlen May 8, 2024
3757328
fix: free last image embed in llava chat handler
abetlen May 9, 2024
bf66a28
chore: Bump version
abetlen May 9, 2024
5ab40e6
feat: Support multiple chat templates - step 1 (#1396)
CISC May 9, 2024
b454f40
Merge pull request from GHSA-56xg-wfcc-g829
retr0reg May 10, 2024
561e880
fix(security): Render all jinja templates in immutable sandbox (#1441)
CISC May 10, 2024
4badac3
chore: Bump version
abetlen May 10, 2024
ac55d0a
fix: Clear kv cache to avoid kv bug when image is evaluated first
abetlen May 10, 2024
eafb6ec
feat: Update llama.cpp
abetlen May 10, 2024
7316502
chore: Bump version
abetlen May 10, 2024
7f59856
fix: Enable CUDA backend for llava. Closes #1324
abetlen May 10, 2024
1547202
docs: Fix typo in README.md (#1444)
yupbank May 10, 2024
9dc5e20
feat: Update llama.cpp
abetlen May 12, 2024
3fe8e9a
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python in…
abetlen May 12, 2024
3c19faa
chore: Bump version
abetlen May 12, 2024
3f8e17a
fix(ci): Use version without extra platform tag in pep503 index
abetlen May 12, 2024
43ba152
feat: Update llama.cpp
abetlen May 13, 2024
50f5c74
Update llama.cpp
abetlen May 14, 2024
4b54f79
chore(deps): bump pypa/cibuildwheel from 2.17.0 to 2.18.0 (#1453)
dependabot[bot] May 14, 2024
389e09c
misc: Remove unnecessary metadata lookups (#1448)
CISC May 14, 2024
5212fb0
feat: add MinTokensLogitProcessor and min_tokens argument to server (…
twaka May 14, 2024
ca8e3c9
feat: Update llama.cpp
abetlen May 16, 2024
e811a81
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python in…
abetlen May 16, 2024
d99a6ba
fix: segfault for models without eos / bos tokens. Closes #1463
abetlen May 16, 2024
b564d05
chore: Bump version
abetlen May 16, 2024
03f171e
example: LLM inference with Ray Serve (#1465)
rgerganov May 17, 2024
d8a3b01
feat: Update llama.cpp
abetlen May 18, 2024
3dbfec7
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python in…
abetlen May 18, 2024
5a595f0
feat: Update llama.cpp
abetlen May 22, 2024
087cc0b
feat: Update llama.cpp
abetlen May 24, 2024
5cae104
feat: Improve Llama.eval performance by avoiding list conversion (#1476)
thoughtp0lice May 24, 2024
a4c9ab8
chore: Bump version
abetlen May 24, 2024
ec43e89
docs: Update multi-modal model section
abetlen May 24, 2024
9e8d7d5
fix(docs): Fix link typo
abetlen May 24, 2024
2d89964
docs: Fix table formatting
abetlen May 24, 2024
454c9bb
feat: Update llama.cpp
abetlen May 27, 2024
c564007
chore(deps): bump pypa/cibuildwheel from 2.18.0 to 2.18.1 (#1472)
dependabot[bot] May 27, 2024
c26004b
feat: Update llama.cpp
abetlen May 29, 2024
2907c26
misc: Update debug build to keep all debug symbols for easier gdb deb…
abetlen May 29, 2024
10b7c50
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python in…
abetlen May 29, 2024
df45a4b
fix: fix string value kv_overrides. Closes #1487
abetlen May 29, 2024
91d05ab
fix: adjust kv_override member names to match llama.cpp
abetlen May 29, 2024
165b4dc
fix: Fix typo in Llama3VisionAlphaChatHandler. Closes #1488
abetlen May 29, 2024
af3ed50
fix: Use numpy recarray for candidates data, fixes bug with temp < 0
abetlen Jun 1, 2024
a6457ba
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python in…
abetlen Jun 1, 2024
6b018e0
misc: Improve llava error messages
abetlen Jun 3, 2024
cd3f1bb
feat: Update llama.cpp
abetlen Jun 4, 2024
ae5682f
fix: Disable Windows+CUDA workaround when compiling for HIPBLAS (#1493)
Engininja2 Jun 4, 2024
c3ef41b
chore: Bump version
abetlen Jun 4, 2024
951e39c
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python in…
abetlen Jun 4, 2024
027f7bc
fix: Avoid duplicate special tokens in chat formats (#1439)
CISC Jun 4, 2024
6e0642c
fix: fix logprobs when BOS is not present (#1471)
a-ghorbani Jun 4, 2024
d634efc
feat: adding `rpc_servers` parameter to `Llama` class (#1477)
chraac Jun 4, 2024
255e1b4
feat: Update llama.cpp
abetlen Jun 7, 2024
83d6b26
feat: Update llama.cpp
abetlen Jun 9, 2024
1615eb9
feat: Update llama.cpp
abetlen Jun 10, 2024
86a38ad
chore: Bump version
abetlen Jun 10, 2024
e342161
feat: Update llama.cpp
abetlen Jun 13, 2024
dbcf64c
feat: Support SPM infill (#1492)
CISC Jun 13, 2024
320a5d7
feat: Add `.close()` method to `Llama` class to explicitly free model…
jkawamoto Jun 13, 2024
5af8163
chore(deps): bump pypa/cibuildwheel from 2.18.1 to 2.19.0 (#1522)
dependabot[bot] Jun 13, 2024
9e396b3
feat: Update workflows and pre-built wheels (#1416)
Smartappli Jun 13, 2024
8401c6f
feat: Update llama.cpp
abetlen Jun 13, 2024
f4491c4
feat: Update llama.cpp
abetlen Jun 17, 2024
4c1d74c
fix: Make destructor to automatically call .close() method on Llama c…
abetlen Jun 19, 2024
554fd08
feat: Update llama.cpp
abetlen Jun 19, 2024
6c33190
chore: Bump version
abetlen Jun 19, 2024
d98a24a
docs: Remove references to deprecated opencl backend. Closes #1512
abetlen Jun 20, 2024
a3a0340
Merge branch 'main' into disk_cache_server
jonperry-dev Jun 20, 2024
5beec1a
feat: Update llama.cpp
abetlen Jun 21, 2024
27d5358
docs: Update readme examples to use newer Qwen2 model (#1544)
jncraton Jun 21, 2024
398fe81
chore(deps): bump docker/build-push-action from 5 to 6 (#1539)
dependabot[bot] Jun 21, 2024
35c980e
chore(deps): bump pypa/cibuildwheel from 2.18.1 to 2.19.1 (#1527)
dependabot[bot] Jun 21, 2024
04959f1
feat: Update llama_cpp.py bindings
abetlen Jun 21, 2024
c0fb1e5
Bump llama.cpp w/ serialization changes
tc-wolf Jun 24, 2024
25c42b9
Pin numpy, add_bos for LlamaStaticCache
tc-wolf Jun 26, 2024
117cbb2
feat: Update llama.cpp
abetlen Jul 2, 2024
bf5e0bb
fix(server): Update `embeddings=False` by default. Embeddings should …
abetlen Jul 2, 2024
73ddf29
fix(ci): Fix the CUDA workflow (#1551)
oobabooga Jul 2, 2024
c546c94
misc: Install shared libraries to lib subdirectory
abetlen Jul 2, 2024
92bad6e
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python in…
abetlen Jul 2, 2024
139774b
fix: Update shared library rpath
abetlen Jul 2, 2024
d5f6a15
fix: force $ORIGIN rpath for shared library files
abetlen Jul 2, 2024
e51f200
fix: Fix installation location for shared libraries
abetlen Jul 2, 2024
73fe013
fix: Fix RPATH so it works on macos
abetlen Jul 2, 2024
dc20e8c
fix: Copy dependencies for windows
abetlen Jul 2, 2024
296304b
fix(server): Fix bug in FastAPI streaming response where dependency w…
abetlen Jul 2, 2024
bd5d17b
feat: Update llama.cpp
abetlen Jul 2, 2024
b4cc923
chore: Bump version
abetlen Jul 2, 2024
4fb6fc1
fix(ci): Use LLAMA_CUDA for cuda wheels
abetlen Jul 2, 2024
387d01d
fix(misc): Fix type errors
abetlen Jul 2, 2024
8992a1a
feat: Update llama.cpp
abetlen Jul 2, 2024
3a551eb
fix(ci): Update macos image (macos-11 is removed)
abetlen Jul 2, 2024
01bddd6
chore: Bump version
abetlen Jul 2, 2024
7e20e34
feat: Update llama.cpp
abetlen Jul 4, 2024
62804ee
feat: Update llama.cpp
abetlen Jul 6, 2024
157d913
fix: update token_to_piece
abetlen Jul 6, 2024
218d361
feat: Update llama.cpp
abetlen Jul 9, 2024
1a55417
fix: Update LLAMA_ flags to GGML_ flags
abetlen Jul 9, 2024
09a4f78
fix(ci): Update LLAMA_ flags to GGML_
abetlen Jul 9, 2024
0481a3a
fix(docs): Update LLAMA_ flags to GGML_ flags
abetlen Jul 9, 2024
fccff80
fix(docs): Remove kompute backend references
abetlen Jul 9, 2024
276ea28
fix(misc): Update LLAMA_ flags to GGML_
abetlen Jul 9, 2024
aaf4cbe
chore: Bump version
abetlen Jul 9, 2024
14760c6
chore(deps): bump pypa/cibuildwheel from 2.19.1 to 2.19.2 (#1568)
dependabot[bot] Jul 9, 2024
e31f096
chore(deps): bump microsoft/setup-msbuild from 1.1 to 1.3 (#1569)
dependabot[bot] Jul 9, 2024
b77e507
feat(ci): Dockerfile update base images and post-install cleanup (#1530)
Smartappli Jul 9, 2024
c1ae815
fix(misc): Format
abetlen Jul 9, 2024
08f2bb3
fix(minor): Minor ruff fixes
abetlen Jul 9, 2024
f7f4fa8
feat(ci): Update simple Dockerfile (#1459)
yentur Jul 9, 2024
7e9f994
Update Dockerfile for deployment
tc-wolf Jul 17, 2024
3c00f61
Merge tag 'v0.2.82' into bumped_llama_cpp_with_disk_cache
tc-wolf Jul 17, 2024
7613d23
feat: Update llama.cpp
abetlen Jul 17, 2024
66d5cdd
fix(server): Use split_mode from model settings (#1594)
grider-withourai Jul 17, 2024
797f54c
fix(docs): Update README.md typo (#1589)
ericcurtin Jul 17, 2024
0700476
fix: Change repeat_penalty to 1.0 to match llama.cpp defaults (#1590)
ddh0 Jul 18, 2024
3638f73
feat: Add 'required' literal to ChatCompletionToolChoiceOption (#1597)
mjschock Jul 18, 2024
f95057a
chore(deps): bump microsoft/setup-msbuild from 1.3 to 2 (#1585)
dependabot[bot] Jul 20, 2024
5105f40
feat: Update llama.cpp
abetlen Jul 22, 2024
816d491
chore: Bump version
abetlen Jul 22, 2024
3d729a3
Build options / Dockerfile
tc-wolf Jul 22, 2024
a14b49d
feat: Update llama.cpp
abetlen Jul 24, 2024
ff17b58
More OpenBLAS path wrangling
tc-wolf Jul 26, 2024
dccb148
feat: Update llama.cpp
abetlen Jul 28, 2024
9ed6b27
fix: Correcting run.sh filepath in Simple Docker implementation (#1626)
mashuk999 Jul 28, 2024
4bf3b43
chore: Bump version
abetlen Jul 28, 2024
cffb4ec
feat: Update llama.cpp
abetlen Jul 31, 2024
53c6f32
feat: Update llama.cpp
abetlen Jul 31, 2024
0b1a8d8
feat: FreeBSD compatibility (#1635)
yurivict Jul 31, 2024
8297a0d
fix(docker): Update Dockerfile build options from `LLAMA_` to `GGML_`…
Smartappli Jul 31, 2024
ac02174
fix(docker): Fix GGML_CUDA param (#1633)
Smartappli Jul 31, 2024
8a12c9f
fix(docker): Update Dockerfile BLAS options (#1632)
Smartappli Jul 31, 2024
1f0b9a2
fix : Missing LoRA adapter after API change (#1630)
shamitv Jul 31, 2024
f7b9e6d
chore: Bump version
abetlen Jul 31, 2024
5575fed
fix: llama_grammar_accept_token arg order (#1649)
tc-wolf Aug 4, 2024
dff186c
feat: Ported back new grammar changes from C++ to Python implementati…
ExtReMLapin Aug 7, 2024
18f58fe
feat: Update llama.cpp
abetlen Aug 7, 2024
ce6466f
chore: Bump version
abetlen Aug 7, 2024
198f47d
feat(ci): Re-build wheel index automatically when releases are created
abetlen Aug 7, 2024
a07b337
feat: Update llama.cpp
abetlen Aug 7, 2024
9cad571
fix: Include all llama.cpp source files and subdirectories
abetlen Aug 7, 2024
8432116
chore: Bump version
abetlen Aug 7, 2024
e966f3b
feat: Add more detailed log for prefix-match (#1659)
xu-song Aug 7, 2024
131db40
chore(deps): bump pypa/cibuildwheel from 2.19.2 to 2.20.0 (#1657)
dependabot[bot] Aug 7, 2024
5e39a85
feat: Enable recursive search of HFFS.ls when using `from_pretrained`…
benHeid Aug 7, 2024
c5de5d3
feat: Update llama.cpp
abetlen Aug 8, 2024
bfb42b7
Merge branch 'main' of github.com:abetlen/llama-cpp-python into main
abetlen Aug 8, 2024
0998ea0
fix: grammar prints on each call. Closes #1666
abetlen Aug 8, 2024
7aaf701
fix: typo
abetlen Aug 8, 2024
45de9d5
feat: Update llama.cpp
abetlen Aug 10, 2024
4244151
feat: Update llama.cpp
abetlen Aug 12, 2024
95a1533
fix: Added back from_file method to LlamaGrammar (#1673)
ExtReMLapin Aug 12, 2024
9bab46f
fix: only print 'cache saved' in verbose mode (#1668)
lsorber Aug 12, 2024
8ed663b
feat: Update llama.cpp
abetlen Aug 12, 2024
fc19cc7
chore: Bump version
abetlen Aug 13, 2024
63d65ac
feat: Update llama.cpp
abetlen Aug 15, 2024
78e35c4
fix: missing dependencies for test (#1680)
jkawamoto Aug 15, 2024
3c7501b
fix: Llama.close didn't free lora adapter (#1679)
jkawamoto Aug 15, 2024
7bf07ec
feat: Update llama.cpp
abetlen Aug 16, 2024
658b244
Merge branch 'main' of github.com:abetlen/llama-cpp-python into main
abetlen Aug 16, 2024
a2ba731
feat: Update llama.cpp
abetlen Aug 19, 2024
d7328ef
chore: Bump version
abetlen Aug 19, 2024
fc5bbcb
Merge tag 'v0.2.89' into bumped_llama_cpp_with_disk_cache
tc-wolf Aug 19, 2024
93f8c88
Make llama.cpp point to fork
tc-wolf Aug 19, 2024
0c41b7f
Single-file Deployment
tc-wolf Aug 19, 2024
87e02e3
Update Dockerfile build mcpu/march
tc-wolf Sep 6, 2024
b196925
Compilation changes
tc-wolf Sep 6, 2024
da9f15a
Update Dockerfile
tc-wolf Sep 10, 2024
121eaaa
Check length of match before reload
tc-wolf Oct 8, 2024
7ca7f28
Allow reloading without scores
tc-wolf Oct 14, 2024
ed6c354
Fix bug
tc-wolf Oct 14, 2024
90d42c3
Catch StateReloadError
tc-wolf Oct 15, 2024
46718c9
Add tests
tc-wolf Oct 15, 2024
8362cfa
Skip saving logits
tc-wolf Oct 15, 2024
de7f862
Add check + note
tc-wolf Oct 15, 2024
b4e2156
Finalize llama cache changes
tc-wolf Oct 21, 2024
ca5d1a4
Finalize tests
tc-wolf Oct 21, 2024
0967eda
Better variable name
tc-wolf Oct 21, 2024
6634017
Merge pull request #1 from tc-wolf/optimize_kv_cache_size
tc-wolf Oct 21, 2024
75466a3
Simplify Dockerfile
tc-wolf Dec 6, 2024
9e19903
Add deploy target for mac server bundle
tc-wolf Dec 18, 2024
ec10a80
Merge pull request #2 from tc-wolf/standalone_server_mac
tc-wolf Dec 18, 2024
524ae21
Update Dockerfile to build w/ older Ubuntu
tc-wolf Jan 3, 2025
832636c
Update build args
tc-wolf Jan 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
examples: fix quantize example (abetlen#1387)
@iyubondyrev thank you!
  • Loading branch information
iyubondyrev authored Apr 28, 2024
commit e6bbfb863c38e7575e4fe87a823ac6ce2e15c27c
13 changes: 8 additions & 5 deletions 13 examples/low_level_api/quantize.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,16 @@


def main(args):
fname_inp = args.fname_inp.encode("utf-8")
fname_out = args.fname_out.encode("utf-8")
if not os.path.exists(fname_inp):
raise RuntimeError(f"Input file does not exist ({fname_inp})")
if os.path.exists(fname_out):
raise RuntimeError(f"Output file already exists ({fname_out})")
fname_inp = args.fname_inp.encode("utf-8")
fname_out = args.fname_out.encode("utf-8")
itype = args.itype
return_code = llama_cpp.llama_model_quantize(fname_inp, fname_out, itype)
ftype = args.type
args = llama_cpp.llama_model_quantize_default_params()
args.ftype = ftype
return_code = llama_cpp.llama_model_quantize(fname_inp, fname_out, args)
if return_code != 0:
raise RuntimeError("Failed to quantize model")

Expand All @@ -20,6 +22,7 @@ def main(args):
parser = argparse.ArgumentParser()
parser.add_argument("fname_inp", type=str, help="Path to input model")
parser.add_argument("fname_out", type=str, help="Path to output model")
parser.add_argument("type", type=int, help="Type of quantization (2: q4_0, 3: q4_1)")
parser.add_argument("type", type=int, help="Type of quantization (2: q4_0, 3: q4_1), see llama_cpp.py for enum")
args = parser.parse_args()
main(args)

Morty Proxy This is a proxified and sanitized view of the page, visit original site.