Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Bump llama.cpp #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 113 commits into from
Mar 31, 2025
Merged

Conversation

tc-wolf
Copy link
Owner

@tc-wolf tc-wolf commented Mar 18, 2025

  • Bump llama.cpp
  • Update deployment script(s) / build for Q4_0 models
  • Update deprecated functions for state saving / loading
  • Update state reloading logic re: setting model.scores
  • Update tests

abetlen and others added 30 commits August 21, 2024 10:11
…#1596)

* enable detokenizing special tokens

* enable skipping_special_tokens in hf_tokenizer detokenize()

* process prev_tokens

* fix doc strings

* Revert changes to LlamaTokenizer prev_tokens and set special to False by default

---------

Co-authored-by: Andrei <abetlen@gmail.com>
… wheels

* Update build-wheels-cuda.yaml

* Update build-wheels-cuda.yaml

* revert

* Bump pyhton from 3.8 to 3.9

* Remove python 3.8

* Remove Python 3.7 and 3.8 deprecated

* Bump python from 3.8 to 3.9

* Add python 3.9

* Add python 3.9, remove macos-11 deprecated, add macos-14

* Bump python 3.8 to 3.9

* Add python 3.13

* Add python 3.13

* python 3.13 remove

* remove python 3.13

* remove python 3.8

* Bump macos-13 to macos-14

* Update build-wheels-metal.yaml

* Update build-wheels-metal.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-wheels-metal.yaml

* Update generate-index-from-release.yaml

Add avx, avx2 and avx512

* Update test.yaml

* Update test-pypi.yaml

* Update publish.yaml

* Update publish-to-test.yaml

* Update build-wheels-cuda.yaml

Cuda with AVX2 by default

* Update build-wheels-cuda.yaml

* remove DEPRECATED 32 bits

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

Upgrade matrix os to latest version

* Update build-wheels-metal.yaml

* Update build-wheels-cuda.yaml

* Update test.yaml

* Update test-pypi.yaml

* Update test.yaml

Add cache: 'pip'

* Update publish-to-test.yaml

* Update build-wheels-metal.yaml

Add cache: 'pip'

* Update build-wheels-cuda.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-wheels-metal.yaml

remove x86_64

* Update build-wheels-metal.yaml

* Update build-and-release.yaml

* Update build-wheels-metal.yaml

* Update build-wheels-metal.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-wheels-metal.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-and-release.yaml

* Update build-wheels-metal.yaml

* revert

* Remove cpu variants

* Update build-wheels-metal.yaml

* Update build-and-release.yaml

* Update publish-to-test.yaml

* Update build-and-release.yaml

* Update build-wheels-metal.yaml

* Update publish.yaml

* Update test-pypi.yaml

* Update test.yaml

* Update build-and-release.yaml

* Update build-wheels-metal.yaml

* Update publish.yaml

* Update test-pypi.yaml

* Update publish-to-test.yaml

* Update test.yaml

* Update build-and-release.yaml

* Update build-wheels-metal.yaml

* Update publish-to-test.yaml

* Update publish.yaml

* Update test-pypi.yaml

* Update test.yaml

* Update test.yaml

* Update build-and-release.yaml

* Update publish-to-test.yaml

* Update build-wheels-metal.yaml

* Update test-pypi.yaml

* Update test.yaml

* Update build-and-release.yaml

* Update build-wheels-metal.yaml

* Update build-wheels-metal.yaml

* Update publish.yaml

* Update publish-to-test.yaml

* Update test-pypi.yaml

* Update test.yaml

* Update build-wheels-cuda.yaml

* Update generate-index-from-release.yaml

* Update README.md

* Update README.md

* Update test.yaml

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.20.0 to 2.21.1.
- [Release notes](https://github.com/pypa/cibuildwheel/releases)
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md)
- [Commits](pypa/cibuildwheel@v2.20.0...v2.21.1)

---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrei <abetlen@gmail.com>
* Initial samplng api update

* Fix logger

* Update tests

* Update

* Remove seed

* Add sampling chain

* Remove unnused test

* Use Qwen2 0.5B for ci tests

* Fix typo

* Fix typo

* Update cache version

* Use real model for tests

* Add huggingface-hub as a test dependency

* Remove RUST_LOG=trace

* Add actual logit processor test
* Fix memory allocation of ndarray

* Add basic LlamaState tests

* Improve LlamaState test and fix rng / seed

---------

Co-authored-by: Andrei <abetlen@gmail.com>
abetlen and others added 29 commits December 6, 2024 07:37
* fix: correct issue with handling lock during streaming

move locking for streaming into get_event_publisher call so it is locked and unlocked in the correct task for the streaming reponse

* fix: simplify exit stack management for create_chat_completion and create_completion

* fix: correct missing `async with` and format code

* fix: remove unnecessary explicit use of AsyncExitStack

fix: correct type hints for body_model

---------

Co-authored-by: Andrei <abetlen@gmail.com>
* feat: Sync with llama.cpp

Add `no_perf` field to `llama_context_params` to optionally disable performance timing measurements.

* fix: Display performance metrics by default

---------

Co-authored-by: Andrei <abetlen@gmail.com>
- Remove deprecated functions
- Formatting (auto)
- Don't add additional test in load_state for size, keep doing what
  upstream is doing there.
- Don't reload numpy logits (scores) at all if not required
- Comment out block for setting last logits from the lllama.cpp data
- Raise error if save_logits but model will not produce
  - `logits_all` False
- Get rid of extraneous comma that was causing bytes set to fail
  equality check
- Helper to get logits from llama.cpp context to numpy
- Model w/ and w/out logits_all
- Update tests as needed now that `model.scores` not set on reload
- Updated build options
  - Needs to work w/ aarch64 SIMD support, so have to set some new build flags
  - Q4_0 w/ weight repacking
- Add ggml-base and ggml-cpu to exported libraries
- Needed for building the llama-cpp-python server for Mac OS
- Include all libs in `libggml_base_path`
Saved state can be larger than `n_ctx` / `n_batch`.
@tc-wolf tc-wolf merged commit 9b631db into bumped_llama_cpp_with_disk_cache Mar 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Morty Proxy This is a proxified and sanitized view of the page, visit original site.