feat: support llama-cpp-python v0.3.2 by cdoern · Pull Request #2825 · instructlab/instructlab

cdoern · Dec 18, 2024

version 0.3.5 of llama-cpp-python has a known issue abetlen/llama-cpp-python#1861 version 0.3.2 has granite 3.0 support and does not have this issue. Bump to this version

this required some additions to how we handle chat exceptions. As of these newer 0.3.z llama-cpp-python versions,
a bad request causes the server to die. This requires us to know the max_ctx_size of the server before passing a completions request so that
we can maintain the behavior of trimming messages until we can respond to one that fits.

in order to do this, the config now contains a current_max_ctx_size field that we will update when spinning up a server.
in the case that a user implicitly starts a llama-cpp-python server when calling ilab model chat, we set the max_tokens to the
current max_ctx_size in the serve config.

Checklist:

Commit Message Formatting: Commit titles and messages follow guidelines in the
conventional commits.
Changelog updated with breaking and/or notable changes for the next minor release.
Documentation has been updated, if necessary.
Unit tests have been added, if necessary.
Functional tests have been added, if necessary.
E2E Workflow tests have been added, if necessary.

cdoern · Dec 19, 2024

update here: as of llama_cpp_python 0.3.z BadRequestError causes the server to become unavailable. This means our method of removing messages with exceed the context window from our list, does not work since at the point we catch them , the server is already dead.

we need to keep track of the max_ctx_size being used in the active server, and then make sure before we pass the list to the openai endpoint that we remove the most recent message if the length of the content in the list is greater than max_ctx_size

github-actions · Jan 7, 2025

e2e workflow failed on this PR: View run, please investigate.

nathan-weinberg · Jan 7, 2025

We'll need #2863 to merge to use the large test here

github-actions · Jan 7, 2025

E2E (NVIDIA L40S x4) workflow launched on this PR: View run

github-actions · Jan 7, 2025

e2e workflow failed on this PR: View run, please investigate.

github-actions · Jan 7, 2025

E2E (NVIDIA L40S x4) workflow launched on this PR: View run

version 0.3.2 has granite 3.0 support and does not have this issue. Bump to this version this required some additions to how we handle chat exceptions. As of these newer 0.3.z llama-cpp-python versions, a bad request causes the server to die. This requires us to know the max_ctx_size of the server before passing a completions request so that we can maintain the behavior of trimming messages until we can respond to one that fits. in order to do this, the config now contains a `current_max_ctx_size` field that we will update when spinning up a server. in the case that a user implicitly starts a llama-cpp-python server when calling `ilab model chat`, we set the max_tokens to the current `max_ctx_size` in the serve config. Signed-off-by: Charlie Doern <cdoern@redhat.com>

github-actions · Jan 7, 2025

E2E (NVIDIA L40S x4) workflow launched on this PR: View run

github-actions · Jan 8, 2025

e2e workflow succeeded on this PR: View run, congrats!

cdoern · Jan 8, 2025

This PR needs to be manually merged. Given it has two approvals and passes the S, M, and L E2E CI Jobs, I will be merging it manually.

The container test has been failing and was only triggered since I needed to change build arguments in the various container files.

Will merge once all CI except for the container build passes.

cdoern · Jan 8, 2025

@Mergifyio backport release-v0.22

mergify · Jan 8, 2025

backport release-v0.22

✅ Backports have been created

Details

#2883 feat: support llama-cpp-python v0.3.2 (backport #2825) has been created for branch release-v0.22

version 0.3.5 of llama-cpp-python has a known issue abetlen/llama-cpp-python#1861 version 0.3.2 has granite 3.0 support and does not have this issue. Bump to this version this required some additions to how we handle chat exceptions. As of these newer 0.3.z llama-cpp-python versions, a bad request causes the server to die. This requires us to know the max_ctx_size of the server before passing a completions request so that we can maintain the behavior of trimming messages until we can respond to one that fits. in order to do this, the config now contains a `current_max_ctx_size` field that we will update when spinning up a server. in the case that a user implicitly starts a llama-cpp-python server when calling `ilab model chat`, we set the max_tokens to the current `max_ctx_size` in the serve config. **Checklist:** - [ ] **Commit Message Formatting**: Commit titles and messages follow guidelines in the [conventional commits](https://www.conventionalcommits.org/en/v1.0.0/#summary). - [ ] [Changelog](https://github.com/instructlab/instructlab/blob/main/CHANGELOG.md) updated with breaking and/or notable changes for the next minor release. - [ ] Documentation has been updated, if necessary. - [ ] Unit tests have been added, if necessary. - [ ] Functional tests have been added, if necessary. - [ ] E2E Workflow tests have been added, if necessary. <hr>This is an automatic backport of pull request #2825 done by [Mergify](https://mergify.com). Approved-by: cdoern Approved-by: alinaryan

mergify bot added CI/CD Affects CI/CD configuration container Affects containization aspects documentation Improvements or additions to documentation testing Relates to testing dependencies Relates to dependencies ci-failure PR has at least one CI failure labels Dec 18, 2024

cdoern force-pushed the llama-cpp-bump branch from 67880cf to e1a4a14 Compare December 18, 2024 21:28

mergify bot removed the ci-failure PR has at least one CI failure label Dec 18, 2024

cdoern force-pushed the llama-cpp-bump branch from e1a4a14 to 31f9a46 Compare December 18, 2024 22:03

mergify bot added the ci-failure PR has at least one CI failure label Dec 18, 2024

cdoern force-pushed the llama-cpp-bump branch 2 times, most recently from f6b726e to 565ce5e Compare December 19, 2024 02:56

mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Dec 19, 2024

cdoern force-pushed the llama-cpp-bump branch from 565ce5e to 5e77c56 Compare December 19, 2024 03:08

mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Dec 19, 2024

cdoern force-pushed the llama-cpp-bump branch from 5e77c56 to e782906 Compare December 19, 2024 21:01

mergify bot removed the ci-failure PR has at least one CI failure label Dec 19, 2024

mergify bot added the ci-failure PR has at least one CI failure label Dec 19, 2024

cdoern force-pushed the llama-cpp-bump branch from e782906 to e50f7e0 Compare December 19, 2024 21:23

mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Dec 19, 2024

cdoern force-pushed the llama-cpp-bump branch from e50f7e0 to 283d846 Compare December 20, 2024 16:30

mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Dec 20, 2024

cdoern force-pushed the llama-cpp-bump branch from 283d846 to 380bc82 Compare December 20, 2024 16:37

mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Dec 20, 2024

mergify bot added the ci-failure PR has at least one CI failure label Jan 7, 2025

cdoern linked an issue Jan 7, 2025 that may be closed by this pull request

Update llama_cpp_python from 0.2.79 to 0.3 for executing new LLM architectures (granite 3.0) #2516

Closed

cdoern force-pushed the llama-cpp-bump branch from 84f96ed to b57cdcf Compare January 7, 2025 17:46

mergify bot removed the ci-failure PR has at least one CI failure label Jan 7, 2025

mergify bot added the ci-failure PR has at least one CI failure label Jan 7, 2025

cdoern force-pushed the llama-cpp-bump branch from b57cdcf to 4995aeb Compare January 7, 2025 21:14

mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Jan 7, 2025

cdoern force-pushed the llama-cpp-bump branch from 4995aeb to 275f1de Compare January 7, 2025 21:18

mergify bot removed the ci-failure PR has at least one CI failure label Jan 7, 2025

mergify bot added the ci-failure PR has at least one CI failure label Jan 7, 2025

cdoern force-pushed the llama-cpp-bump branch from 275f1de to b29efdd Compare January 7, 2025 22:16

mergify bot removed the ci-failure PR has at least one CI failure label Jan 7, 2025

mergify bot added the ci-failure PR has at least one CI failure label Jan 7, 2025

cdoern mentioned this pull request Jan 7, 2025

fix the warning when the model is a file type #2854

Merged

6 tasks

cdoern removed the request for review from jaideepr97 January 8, 2025 00:48

cdoern merged commit 6a2f56b into instructlab:main Jan 8, 2025
32 of 33 checks passed

mergify bot mentioned this pull request Jan 8, 2025

feat: support llama-cpp-python v0.3.2 (backport #2825) #2883

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: support llama-cpp-python v0.3.2#2825

feat: support llama-cpp-python v0.3.2#2825
cdoern merged 1 commit intoinstructlab:maininstructlab/instructlab:mainfrom
cdoern:llama-cpp-bumpcdoern/instructlab:llama-cpp-bumpCopy head branch name to clipboard

cdoern commented Dec 18, 2024 •

edited

Loading

Uh oh!

cdoern commented Dec 19, 2024

Uh oh!

github-actions bot commented Jan 7, 2025

Uh oh!

nathan-weinberg commented Jan 7, 2025

Uh oh!

github-actions bot commented Jan 7, 2025

Uh oh!

github-actions bot commented Jan 7, 2025

Uh oh!

github-actions bot commented Jan 7, 2025

Uh oh!

github-actions bot commented Jan 7, 2025

Uh oh!

github-actions bot commented Jan 8, 2025

Uh oh!

cdoern commented Jan 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

cdoern commented Jan 8, 2025

Uh oh!

mergify bot commented Jan 8, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Search code, repositories, users, issues, pull requests...

Comments

Conversation

cdoern commented Dec 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cdoern commented Dec 19, 2024

Uh oh!

github-actions bot commented Jan 7, 2025

Uh oh!

nathan-weinberg commented Jan 7, 2025

Uh oh!

github-actions bot commented Jan 7, 2025

Uh oh!

github-actions bot commented Jan 7, 2025

Uh oh!

github-actions bot commented Jan 7, 2025

Uh oh!

github-actions bot commented Jan 7, 2025

Uh oh!

github-actions bot commented Jan 8, 2025

Uh oh!

cdoern commented Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

cdoern commented Jan 8, 2025

Uh oh!

mergify bot commented Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Backports have been created

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cdoern commented Dec 18, 2024 •

edited

Loading

cdoern commented Jan 8, 2025 •

edited

Loading

mergify bot commented Jan 8, 2025 •

edited

Loading