Bump llama.cpp #4

tc-wolf · Mar 18, 2025

Bump llama.cpp
Update deployment script(s) / build for Q4_0 models
Update deprecated functions for state saving / loading
Update state reloading logic re: setting model.scores
Update tests

…#1596) * enable detokenizing special tokens * enable skipping_special_tokens in hf_tokenizer detokenize() * process prev_tokens * fix doc strings * Revert changes to LlamaTokenizer prev_tokens and set special to False by default --------- Co-authored-by: Andrei <abetlen@gmail.com>

… wheels * Update build-wheels-cuda.yaml * Update build-wheels-cuda.yaml * revert * Bump pyhton from 3.8 to 3.9 * Remove python 3.8 * Remove Python 3.7 and 3.8 deprecated * Bump python from 3.8 to 3.9 * Add python 3.9 * Add python 3.9, remove macos-11 deprecated, add macos-14 * Bump python 3.8 to 3.9 * Add python 3.13 * Add python 3.13 * python 3.13 remove * remove python 3.13 * remove python 3.8 * Bump macos-13 to macos-14 * Update build-wheels-metal.yaml * Update build-wheels-metal.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-wheels-metal.yaml * Update generate-index-from-release.yaml Add avx, avx2 and avx512 * Update test.yaml * Update test-pypi.yaml * Update publish.yaml * Update publish-to-test.yaml * Update build-wheels-cuda.yaml Cuda with AVX2 by default * Update build-wheels-cuda.yaml * remove DEPRECATED 32 bits * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml Upgrade matrix os to latest version * Update build-wheels-metal.yaml * Update build-wheels-cuda.yaml * Update test.yaml * Update test-pypi.yaml * Update test.yaml Add cache: 'pip' * Update publish-to-test.yaml * Update build-wheels-metal.yaml Add cache: 'pip' * Update build-wheels-cuda.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-wheels-metal.yaml remove x86_64 * Update build-wheels-metal.yaml * Update build-and-release.yaml * Update build-wheels-metal.yaml * Update build-wheels-metal.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-wheels-metal.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-wheels-metal.yaml * revert * Remove cpu variants * Update build-wheels-metal.yaml * Update build-and-release.yaml * Update publish-to-test.yaml * Update build-and-release.yaml * Update build-wheels-metal.yaml * Update publish.yaml * Update test-pypi.yaml * Update test.yaml * Update build-and-release.yaml * Update build-wheels-metal.yaml * Update publish.yaml * Update test-pypi.yaml * Update publish-to-test.yaml * Update test.yaml * Update build-and-release.yaml * Update build-wheels-metal.yaml * Update publish-to-test.yaml * Update publish.yaml * Update test-pypi.yaml * Update test.yaml * Update test.yaml * Update build-and-release.yaml * Update publish-to-test.yaml * Update build-wheels-metal.yaml * Update test-pypi.yaml * Update test.yaml * Update build-and-release.yaml * Update build-wheels-metal.yaml * Update build-wheels-metal.yaml * Update publish.yaml * Update publish-to-test.yaml * Update test-pypi.yaml * Update test.yaml * Update build-wheels-cuda.yaml * Update generate-index-from-release.yaml * Update README.md * Update README.md * Update test.yaml --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>

Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.20.0 to 2.21.1. - [Release notes](https://github.com/pypa/cibuildwheel/releases) - [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md) - [Commits](pypa/cibuildwheel@v2.20.0...v2.21.1) --- updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Andrei <abetlen@gmail.com>

* Initial samplng api update * Fix logger * Update tests * Update * Remove seed * Add sampling chain * Remove unnused test * Use Qwen2 0.5B for ci tests * Fix typo * Fix typo * Update cache version * Use real model for tests * Add huggingface-hub as a test dependency * Remove RUST_LOG=trace * Add actual logit processor test

* Fix memory allocation of ndarray * Add basic LlamaState tests * Improve LlamaState test and fix rng / seed --------- Co-authored-by: Andrei <abetlen@gmail.com>

…mory requirements for large context. Closes abetlen#1542

…len#1858)

* fix: correct issue with handling lock during streaming move locking for streaming into get_event_publisher call so it is locked and unlocked in the correct task for the streaming reponse * fix: simplify exit stack management for create_chat_completion and create_completion * fix: correct missing `async with` and format code * fix: remove unnecessary explicit use of AsyncExitStack fix: correct type hints for body_model --------- Co-authored-by: Andrei <abetlen@gmail.com>

* feat: Sync with llama.cpp Add `no_perf` field to `llama_context_params` to optionally disable performance timing measurements. * fix: Display performance metrics by default --------- Co-authored-by: Andrei <abetlen@gmail.com>

- Remove deprecated functions - Formatting (auto)

- Don't add additional test in load_state for size, keep doing what upstream is doing there. - Don't reload numpy logits (scores) at all if not required - Comment out block for setting last logits from the lllama.cpp data

- Raise error if save_logits but model will not produce - `logits_all` False - Get rid of extraneous comma that was causing bytes set to fail equality check

- Helper to get logits from llama.cpp context to numpy - Model w/ and w/out logits_all - Update tests as needed now that `model.scores` not set on reload

- Updated build options - Needs to work w/ aarch64 SIMD support, so have to set some new build flags - Q4_0 w/ weight repacking - Add ggml-base and ggml-cpu to exported libraries

- Needed for building the llama-cpp-python server for Mac OS - Include all libs in `libggml_base_path`

Saved state can be larger than `n_ctx` / `n_batch`.

abetlen and others added 30 commits August 21, 2024 10:11

feat: Update llama.cpp

a20f13f

feat: Update llama.cpp

259ee15

feat: Update llama.cpp

82ae7f9

feat: Add MiniCPMv26 chat handler.

f70df82

fix: Update name to MiniCPMv26ChatHandler

e251a0b

fix: pull all gh releases for self-hosted python index

c68e7fb

feat: Add server chat_format minicpm-v-2.6 for MiniCPMv26ChatHandler

97d527e

docs: Add project icon courtesy of 🤗

b570fd3

docs: center icon and resize

cbbfad4

docs: Add MiniCPM-V-2.6 to multi-modal model list

ad2deaf

feat: Update llama.cpp

332720d

chore: Bump version

077ecb6

misc(fix): Update CHANGELOG

45001ac

docs: Update README

4b1e364

docs: Update README

8b853c0

docs: Update README

9cba3b8

fix: Use system message in og qwen format. Closes abetlen#1697

98eb092

feat: Update llama.cpp

dcb0d0c

feat: Update llama.cpp

9769e57

feat: Update llama.cpp

c3fc80a

feat: Update llama.cpp

9497bcd

feat: Update llama.cpp

c032fc6

feat: Update llama.cpp

1e64664

misc: Format

9b64bb5

fix: Fix memory allocation of ndarray (abetlen#1704)

22cedad

* Fix memory allocation of ndarray * Add basic LlamaState tests * Improve LlamaState test and fix rng / seed --------- Co-authored-by: Andrei <abetlen@gmail.com>

fix: Don't store scores internally unless logits_all=True. Reduces me…

29afcfd

…mory requirements for large context. Closes abetlen#1542

abetlen and others added 29 commits December 6, 2024 07:37

misc: Update run server command

b9b50e5

feat: Update llama.cpp

5585f8a

Add CUDA 12.5 and 12.6 to generated output wheels

61508c2

chore: Bump version

a9fe0f8

fix(ci): hotfix for wheels

ca80802

chore: Bump version

002f583

fix(ci): update macos runner image to non-deprecated version

ea4d86a

fix: add missing await statements for async exit_stack handling (abet…

afedfc8

…len#1858)

feat: Update llama.cpp

801a73a

chore: Bump version

803924b

feat: Update llama.cpp

2bc1d97

feat: Update llama.cpp

c9dfad4

feat: Update llama.cpp

1d5f534

chore: Bump version

0580cf2

feat: Update llama.cpp

80be68a

feat: Update llama.cpp

0b89fe4

fix(ci): Fix the CUDA workflow (abetlen#1894)

14879c7

chore: Bump version

710e19a

Merge branch 'main' into experiment_bump_llama_cpp

0a8f97d

Fix for List typehint

70d1048

Update state functions + formatting

15bf3e8

- Remove deprecated functions - Formatting (auto)

Fixup reloading

e5cccf4

- Don't add additional test in load_state for size, keep doing what upstream is doing there. - Don't reload numpy logits (scores) at all if not required - Comment out block for setting last logits from the lllama.cpp data

Fix some logic

c9bf03a

- Raise error if save_logits but model will not produce - `logits_all` False - Get rid of extraneous comma that was causing bytes set to fail equality check

Update tests for cache

5de50b9

- Helper to get logits from llama.cpp context to numpy - Model w/ and w/out logits_all - Update tests as needed now that `model.scores` not set on reload

Update Dockerfile

68d081d

- Updated build options - Needs to work w/ aarch64 SIMD support, so have to set some new build flags - Q4_0 w/ weight repacking - Add ggml-base and ggml-cpu to exported libraries

Update Makefile

aff151d

- Needed for building the llama-cpp-python server for Mac OS - Include all libs in `libggml_base_path`

Remove unnecessary (wrong) check

6235674

Saved state can be larger than `n_ctx` / `n_batch`.

tc-wolf merged commit 9b631db into bumped_llama_cpp_with_disk_cache Mar 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bump llama.cpp #4

Bump llama.cpp #4

Uh oh!

tc-wolf commented Mar 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Search code, repositories, users, issues, pull requests...

Bump llama.cpp #4

Bump llama.cpp #4

Uh oh!

Conversation

tc-wolf commented Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

tc-wolf commented Mar 18, 2025 •

edited

Loading