Bumped llama cpp and updated deprecated functions #3

tc-wolf · Mar 18, 2025

No description provided.

…desired using `-1` (abetlen#1364)

* allow for possibly non-pooled embeddings * add more to embeddings section in README.md --------- Co-authored-by: Andrei <abetlen@gmail.com>

… definition in bindings

Add github-actions update

* Bump actions/setup-python@v4 to v5 * Update build-wheels-metal.yaml * Update build-wheels-metal.yaml * Update build-wheels-metal.yaml

@iyubondyrev thank you!

* fix completion tokens tracking, prompt forming * fix 'function_call' and 'tool_calls' depending on 'functions' and 'tools', incompatibility with python 3.8 * Updated README * fix for openai server compatibility --------- Co-authored-by: Andrei <abetlen@gmail.com>

…abetlen#1397) Bumps [conda-incubator/setup-miniconda](https://github.com/conda-incubator/setup-miniconda) from 2.2.0 to 3.0.4. - [Release notes](https://github.com/conda-incubator/setup-miniconda/releases) - [Changelog](https://github.com/conda-incubator/setup-miniconda/blob/main/CHANGELOG.md) - [Commits](conda-incubator/setup-miniconda@v2.2.0...v3.0.4) --- updated-dependencies: - dependency-name: conda-incubator/setup-miniconda dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Bumps [actions/cache](https://github.com/actions/cache) from 3.3.2 to 4.0.2. - [Release notes](https://github.com/actions/cache/releases) - [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md) - [Commits](actions/cache@v3.3.2...v4.0.2) --- updated-dependencies: - dependency-name: actions/cache dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Bumps [docker/login-action](https://github.com/docker/login-action) from 2 to 3. - [Release notes](https://github.com/docker/login-action/releases) - [Commits](docker/login-action@v2...v3) --- updated-dependencies: - dependency-name: docker/login-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 4 to 5. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](docker/build-push-action@v4...v5) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.16.5 to 2.17.0. - [Release notes](https://github.com/pypa/cibuildwheel/releases) - [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md) - [Commits](pypa/cibuildwheel@v2.16.5...v2.17.0) --- updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add support for cuda 12.4.1 * Update build-wheels-cuda.yaml * Update build-wheels-cuda.yaml * Update build-wheels-cuda.yaml * Update build-wheels-cuda.yaml * Update build-wheels-cuda.yaml * Update build-wheels-cuda.yaml * Update build-wheels-cuda.yaml * Update build-wheels-cuda.yaml * Update build-wheels-cuda.yaml Revert

…2.4 Closes abetlen#1406

…n release (abetlen#1392) * Update test.yaml Bump actions/checkout@v3 to v4 Bump action/setup-python@v4 to v5 * Update test-pypi.yaml Bum actions/setup-python@v4 to v5 * Update build-and-release.yaml Bump softprops/action-gh-release@v1 to v2 Bump actions/checkout@v3 to v4 Bump actions/setup-python@v3 to v5 * Update publish.yaml Bump actions/checkout@v3 to v4 Bump actions/sertup-python@v4 to v5 * Update publish-to-test.yaml Bump actions/checkout@v3 to v4 Bump actions/setup-python @v4 to v5 * Update test-pypi.yaml Add Python 3.12 * Update build-and-release.yaml * Update build-docker.yaml Bump docker/setup-qemu-action@v2 to v3 Bump docker/setup-buildx-action@v2 to v3 * Update build-and-release.yaml * Update build-and-release.yaml

Bumps [softprops/action-gh-release](https://github.com/softprops/action-gh-release) from 1 to 2. - [Release notes](https://github.com/softprops/action-gh-release/releases) - [Changelog](https://github.com/softprops/action-gh-release/blob/master/CHANGELOG.md) - [Commits](softprops/action-gh-release@v1...v2) --- updated-dependencies: - dependency-name: softprops/action-gh-release dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Bumps [actions/configure-pages](https://github.com/actions/configure-pages) from 4 to 5. - [Release notes](https://github.com/actions/configure-pages/releases) - [Commits](actions/configure-pages@v4...v5) --- updated-dependencies: - dependency-name: actions/configure-pages dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 3 to 4. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@v3...v4) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…t for Multimodal Models (Obsidian, LLaVA1.6, Moondream) (abetlen#1147) * Test dummy image tags in chat templates * Format and improve types for llava_cpp.py * Add from_pretrained support to llava chat format. * Refactor llava chat format to use a jinja2 * Revert chat format test * Add moondream support (wip) * Update moondream chat format * Update moondream chat format * Update moondream prompt * Add function calling support * Cache last image embed * Add Llava1.6 support * Add nanollava support * Add obisidian support * Remove unnecessary import * Re-order multimodal chat formats * Logits all no longer required for multi-modal models * Update README.md * Update docs * Update README * Fix typo * Update README * Fix typo

Has to skip deserializing RNG state but based on new branch.

Add libggml.so and libllama.so under `llama_cpp/lib` (path expected in `_load_shared_library` in `llama_cpp.py`). This means that will be able to locate once bundled. Also prevent from adding *full* OpenBLAS dir (unnecessary) and just the `libopenblas.so` onece built. This shrinks binary size from 200 MB to 40 MB.

Passing mcpu/march to cmake

- Verbose logging / output when compiling - Save buildlog (though currently don't export) so that can inspect build afterward - Build changes: - Disable LLAMAFILE (needed for Q4_0_4_4) - Set march/mcpu/mtune for C/C++ - Make pip verbose (so that get cmake compile output)

- scikit-build -> scikit-build-core - Remove -DGGML_BLAS=ON, OpenBLAS cmake tags from build - Needed (see ggml-org/llama.cpp#5780 (review)) to get this to work properly - Built, deployed, and tested with llama3.1 with q4_0_4_4 quantization

- Add logging statements (seed, timing, match length, etc.) - Change logic: - Use `find_longest_prefix_key` to find length of longest key in cache - If cache_prefix_len > eval_prefix_len, load from disk - Otherwise, skip loading. - Change logging to decode to utf-8 (helpful for JP prompts)

- Create `reload_from_cache_state` method - Still using LLamaState as container - Use low level `ctx.get_logits_ith` to get last calculated logits. - Add StateReloadError so that can be fallible. - Change Llama class to use this instead of `load_state` directly. - Default implementation still uses `load_state`.

- Use ptr.contents, not ptr in `np.array` - Get dtype from return type on annotated signature - Explicitly set copy=True and dtype on np.array - Should not strictly be necessary since pointer is typed

Catch StateReloadError and add logging if runs into this when running.

- Fix loading state (from_buffer -> from_buffer_copy since bytes aren't mutable) - Add tests (E2E, errors when should, reloads successfully, logits correct, etc.) Have to set LLAMA_TEST_MODEL to point to model path in order to get this to run.

D'oh

- Check when saving that model doesn't need logits - Ad note in `reload_from_cache` state to revisit

- Make default to *not* save logits - Error if needed and save_logits False in build_cache - Handle reloading with/without scores if needed + available

- Add more tests - Make llama_state / small_model module scope (so don't need to reload for each test) - Setting env var in `.env` file

Optimize KV cache size

Take out references to OpenBLAS since no longer used

Add deploy target for mac server bundle

- Needs glibc <= 2.31 - But have to use GCC-11+ to build with specific march/mcpu/mtune.

- Set CC / CXX for cmmake build - Make sure python3.9-dev installed (needed for linking w/ pyinstaller into standalone) - Set CMAKE_BUILD_TYPE as env var instead of as cmake var (based on build logs)

sean-bailey and others added 30 commits April 23, 2024 02:35

feat(server): Provide ability to dynamically allocate all threads if …

53ebcc8

…desired using `-1` (abetlen#1364)

ci: Build arm64 wheels. Closes abetlen#1342

611781f

chore: Bump version

c50d330

feat: Update llama.cpp

2a9979f

fix(ci): Fix python macos test runners issue

de37420

fix(ci): Fix metal tests as well

266abfc

feat: Update llama.cpp

7f52335

fix: pydantic deprecation warning

fcfea66

feat: Allow for possibly non-pooled embeddings (abetlen#1380)

f6ed21f

* allow for possibly non-pooled embeddings * add more to embeddings section in README.md --------- Co-authored-by: Andrei <abetlen@gmail.com>

fix: Remove duplicate pooling_type definition and add misisng n_vocab…

173ebc7

… definition in bindings

chore: Bump version

65edc90

ci: Update dependabot.yml (abetlen#1391)

9e7f738

Add github-actions update

ci: Update action versions in build-wheels-metal.yaml (abetlen#1390)

c58b561

* Bump actions/setup-python@v4 to v5 * Update build-wheels-metal.yaml * Update build-wheels-metal.yaml * Update build-wheels-metal.yaml

examples: fix quantize example (abetlen#1387)

e6bbfb8

@iyubondyrev thank you!

feat: Update llama.cpp

c9b85bf

feat: Add support for str type kv_overrides

a411612

fix(ci): Update generate wheel index script to include cu12.3 and cu1…

0c3bc4b

…2.4 Closes abetlen#1406

feat: Update llama.cpp

97fb860

abetlen and others added 29 commits August 15, 2024 14:46

feat: Update llama.cpp

63d65ac

fix: missing dependencies for test (abetlen#1680)

78e35c4

fix: Llama.close didn't free lora adapter (abetlen#1679)

3c7501b

feat: Update llama.cpp

7bf07ec

Merge branch 'main' of github.com:abetlen/llama-cpp-python into main

658b244

feat: Update llama.cpp

a2ba731

chore: Bump version

d7328ef

Merge tag 'v0.2.89' into bumped_llama_cpp_with_disk_cache

fc5bbcb

Make llama.cpp point to fork

93f8c88

Has to skip deserializing RNG state but based on new branch.

Update Dockerfile build mcpu/march

87e02e3

Passing mcpu/march to cmake

Update Dockerfile

da9f15a

- scikit-build -> scikit-build-core - Remove -DGGML_BLAS=ON, OpenBLAS cmake tags from build - Needed (see ggml-org/llama.cpp#5780 (review)) to get this to work properly - Built, deployed, and tested with llama3.1 with q4_0_4_4 quantization

Fix bug

ed6c354

- Use ptr.contents, not ptr in `np.array` - Get dtype from return type on annotated signature - Explicitly set copy=True and dtype on np.array - Should not strictly be necessary since pointer is typed

Catch StateReloadError

90d42c3

Catch StateReloadError and add logging if runs into this when running.

Add tests

46718c9

- Fix loading state (from_buffer -> from_buffer_copy since bytes aren't mutable) - Add tests (E2E, errors when should, reloads successfully, logits correct, etc.) Have to set LLAMA_TEST_MODEL to point to model path in order to get this to run.

Skip saving logits

8362cfa

D'oh

Add check + note

de7f862

- Check when saving that model doesn't need logits - Ad note in `reload_from_cache` state to revisit

Finalize llama cache changes

b4e2156

- Make default to *not* save logits - Error if needed and save_logits False in build_cache - Handle reloading with/without scores if needed + available

Finalize tests

ca5d1a4

- Add more tests - Make llama_state / small_model module scope (so don't need to reload for each test) - Setting env var in `.env` file

Better variable name

0967eda

Merge pull request #1 from tc-wolf/optimize_kv_cache_size

6634017

Optimize KV cache size

Simplify Dockerfile

75466a3

Take out references to OpenBLAS since no longer used

Add deploy target for mac server bundle

9e19903

Merge pull request #2 from tc-wolf/standalone_server_mac

ec10a80

Add deploy target for mac server bundle

Update Dockerfile to build w/ older Ubuntu

524ae21

- Needs glibc <= 2.31 - But have to use GCC-11+ to build with specific march/mcpu/mtune.

Update build args

832636c

- Set CC / CXX for cmmake build - Make sure python3.9-dev installed (needed for linking w/ pyinstaller into standalone) - Set CMAKE_BUILD_TYPE as env var instead of as cmake var (based on build logs)

tc-wolf closed this Mar 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bumped llama cpp and updated deprecated functions #3

Bumped llama cpp and updated deprecated functions #3

Uh oh!

tc-wolf commented Mar 18, 2025

Uh oh!

Uh oh!

Search code, repositories, users, issues, pull requests...

Bumped llama cpp and updated deprecated functions #3

Bumped llama cpp and updated deprecated functions #3

Uh oh!

Conversation

tc-wolf commented Mar 18, 2025

Uh oh!

Uh oh!