forked from abetlen/llama-cpp-python
-
Notifications
You must be signed in to change notification settings - Fork 2
Bump llama.cpp #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
tc-wolf
merged 113 commits into
bumped_llama_cpp_with_disk_cache
from
experiment_bump_llama_cpp
Mar 31, 2025
Merged
Bump llama.cpp #4
tc-wolf
merged 113 commits into
bumped_llama_cpp_with_disk_cache
from
experiment_bump_llama_cpp
Mar 31, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…#1596) * enable detokenizing special tokens * enable skipping_special_tokens in hf_tokenizer detokenize() * process prev_tokens * fix doc strings * Revert changes to LlamaTokenizer prev_tokens and set special to False by default --------- Co-authored-by: Andrei <abetlen@gmail.com>
… wheels * Update build-wheels-cuda.yaml * Update build-wheels-cuda.yaml * revert * Bump pyhton from 3.8 to 3.9 * Remove python 3.8 * Remove Python 3.7 and 3.8 deprecated * Bump python from 3.8 to 3.9 * Add python 3.9 * Add python 3.9, remove macos-11 deprecated, add macos-14 * Bump python 3.8 to 3.9 * Add python 3.13 * Add python 3.13 * python 3.13 remove * remove python 3.13 * remove python 3.8 * Bump macos-13 to macos-14 * Update build-wheels-metal.yaml * Update build-wheels-metal.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-wheels-metal.yaml * Update generate-index-from-release.yaml Add avx, avx2 and avx512 * Update test.yaml * Update test-pypi.yaml * Update publish.yaml * Update publish-to-test.yaml * Update build-wheels-cuda.yaml Cuda with AVX2 by default * Update build-wheels-cuda.yaml * remove DEPRECATED 32 bits * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml Upgrade matrix os to latest version * Update build-wheels-metal.yaml * Update build-wheels-cuda.yaml * Update test.yaml * Update test-pypi.yaml * Update test.yaml Add cache: 'pip' * Update publish-to-test.yaml * Update build-wheels-metal.yaml Add cache: 'pip' * Update build-wheels-cuda.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-wheels-metal.yaml remove x86_64 * Update build-wheels-metal.yaml * Update build-and-release.yaml * Update build-wheels-metal.yaml * Update build-wheels-metal.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-wheels-metal.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-and-release.yaml * Update build-wheels-metal.yaml * revert * Remove cpu variants * Update build-wheels-metal.yaml * Update build-and-release.yaml * Update publish-to-test.yaml * Update build-and-release.yaml * Update build-wheels-metal.yaml * Update publish.yaml * Update test-pypi.yaml * Update test.yaml * Update build-and-release.yaml * Update build-wheels-metal.yaml * Update publish.yaml * Update test-pypi.yaml * Update publish-to-test.yaml * Update test.yaml * Update build-and-release.yaml * Update build-wheels-metal.yaml * Update publish-to-test.yaml * Update publish.yaml * Update test-pypi.yaml * Update test.yaml * Update test.yaml * Update build-and-release.yaml * Update publish-to-test.yaml * Update build-wheels-metal.yaml * Update test-pypi.yaml * Update test.yaml * Update build-and-release.yaml * Update build-wheels-metal.yaml * Update build-wheels-metal.yaml * Update publish.yaml * Update publish-to-test.yaml * Update test-pypi.yaml * Update test.yaml * Update build-wheels-cuda.yaml * Update generate-index-from-release.yaml * Update README.md * Update README.md * Update test.yaml --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>
Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.20.0 to 2.21.1. - [Release notes](https://github.com/pypa/cibuildwheel/releases) - [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md) - [Commits](pypa/cibuildwheel@v2.20.0...v2.21.1) --- updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Andrei <abetlen@gmail.com>
* Initial samplng api update * Fix logger * Update tests * Update * Remove seed * Add sampling chain * Remove unnused test * Use Qwen2 0.5B for ci tests * Fix typo * Fix typo * Update cache version * Use real model for tests * Add huggingface-hub as a test dependency * Remove RUST_LOG=trace * Add actual logit processor test
* Fix memory allocation of ndarray * Add basic LlamaState tests * Improve LlamaState test and fix rng / seed --------- Co-authored-by: Andrei <abetlen@gmail.com>
…mory requirements for large context. Closes abetlen#1542
* fix: correct issue with handling lock during streaming move locking for streaming into get_event_publisher call so it is locked and unlocked in the correct task for the streaming reponse * fix: simplify exit stack management for create_chat_completion and create_completion * fix: correct missing `async with` and format code * fix: remove unnecessary explicit use of AsyncExitStack fix: correct type hints for body_model --------- Co-authored-by: Andrei <abetlen@gmail.com>
* feat: Sync with llama.cpp Add `no_perf` field to `llama_context_params` to optionally disable performance timing measurements. * fix: Display performance metrics by default --------- Co-authored-by: Andrei <abetlen@gmail.com>
- Remove deprecated functions - Formatting (auto)
- Don't add additional test in load_state for size, keep doing what upstream is doing there. - Don't reload numpy logits (scores) at all if not required - Comment out block for setting last logits from the lllama.cpp data
- Raise error if save_logits but model will not produce - `logits_all` False - Get rid of extraneous comma that was causing bytes set to fail equality check
- Helper to get logits from llama.cpp context to numpy - Model w/ and w/out logits_all - Update tests as needed now that `model.scores` not set on reload
- Updated build options - Needs to work w/ aarch64 SIMD support, so have to set some new build flags - Q4_0 w/ weight repacking - Add ggml-base and ggml-cpu to exported libraries
- Needed for building the llama-cpp-python server for Mac OS - Include all libs in `libggml_base_path`
Saved state can be larger than `n_ctx` / `n_batch`.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.