feat: support llama-cpp-python v0.3.2 (backport #2825) by mergify[bot] · Pull Request #2883 · instructlab/instructlab

mergify · Jan 8, 2025

version 0.3.5 of llama-cpp-python has a known issue abetlen/llama-cpp-python#1861 version 0.3.2 has granite 3.0 support and does not have this issue. Bump to this version

this required some additions to how we handle chat exceptions. As of these newer 0.3.z llama-cpp-python versions,
a bad request causes the server to die. This requires us to know the max_ctx_size of the server before passing a completions request so that
we can maintain the behavior of trimming messages until we can respond to one that fits.

in order to do this, the config now contains a current_max_ctx_size field that we will update when spinning up a server.
in the case that a user implicitly starts a llama-cpp-python server when calling ilab model chat, we set the max_tokens to the
current max_ctx_size in the serve config.

Checklist:

Commit Message Formatting: Commit titles and messages follow guidelines in the
conventional commits.
Changelog updated with breaking and/or notable changes for the next minor release.
Documentation has been updated, if necessary.
Unit tests have been added, if necessary.
Functional tests have been added, if necessary.
E2E Workflow tests have been added, if necessary.

This is an automatic backport of pull request #2825 done by [Mergify](https://mergify.com).

cdoern · Jan 10, 2025

@Mergifyio rebase

mergify · Jan 10, 2025

rebase

☑️ Nothing to do

Details

-conflict [📌 rebase requirement]
-closed [📌 rebase requirement]
queue-position = -1 [📌 rebase requirement]
any of:
- #commits-behind > 0 [📌 rebase requirement]
- #commits > 1 [📌 rebase requirement]
- -linear-history [📌 rebase requirement]

version 0.3.2 has granite 3.0 support and does not have this issue. Bump to this version this required some additions to how we handle chat exceptions. As of these newer 0.3.z llama-cpp-python versions, a bad request causes the server to die. This requires us to know the max_ctx_size of the server before passing a completions request so that we can maintain the behavior of trimming messages until we can respond to one that fits. in order to do this, the config now contains a `current_max_ctx_size` field that we will update when spinning up a server. in the case that a user implicitly starts a llama-cpp-python server when calling `ilab model chat`, we set the max_tokens to the current `max_ctx_size` in the serve config. Signed-off-by: Charlie Doern <cdoern@redhat.com> (cherry picked from commit b29efdd) Signed-off-by: Charlie Doern <cdoern@redhat.com>

github-actions · Jan 10, 2025

E2E (NVIDIA L40S x4) workflow launched on this PR: View run

github-actions · Jan 10, 2025

e2e workflow succeeded on this PR: View run, congrats!

mergify bot mentioned this pull request Jan 8, 2025

feat: support llama-cpp-python v0.3.2 #2825

Merged

6 tasks

cdoern approved these changes Jan 9, 2025

View reviewed changes

cdoern added the hold In-progress PR. Tag should be removed before merge. label Jan 9, 2025

mergify bot added the one-approval PR has one approval from a maintainer label Jan 9, 2025

alinaryan approved these changes Jan 9, 2025

View reviewed changes

mergify bot removed the one-approval PR has one approval from a maintainer label Jan 9, 2025

cdoern force-pushed the mergify/bp/release-v0.22/pr-2825 branch 2 times, most recently from 57c4cd3 to 68b31b0 Compare January 9, 2025 20:55

mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Jan 9, 2025

cdoern force-pushed the mergify/bp/release-v0.22/pr-2825 branch from 68b31b0 to 75d854c Compare January 10, 2025 03:34

mergify bot removed the ci-failure PR has at least one CI failure label Jan 10, 2025

nathan-weinberg removed the hold In-progress PR. Tag should be removed before merge. label Jan 10, 2025

mergify bot merged commit 2268719 into release-v0.22 Jan 10, 2025
31 checks passed

mergify bot deleted the mergify/bp/release-v0.22/pr-2825 branch January 10, 2025 15:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: support llama-cpp-python v0.3.2 (backport #2825)#2883

feat: support llama-cpp-python v0.3.2 (backport #2825)#2883
mergify[bot] merged 1 commit intorelease-v0.22instructlab/instructlab:release-v0.22from
mergify/bp/release-v0.22/pr-2825instructlab/instructlab:mergify/bp/release-v0.22/pr-2825Copy head branch name to clipboard

mergify bot commented Jan 8, 2025

Uh oh!

cdoern commented Jan 10, 2025

Uh oh!

mergify bot commented Jan 10, 2025

Uh oh!

github-actions bot commented Jan 10, 2025

Uh oh!

github-actions bot commented Jan 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Search code, repositories, users, issues, pull requests...

Comments

Conversation

mergify bot commented Jan 8, 2025

Uh oh!

cdoern commented Jan 10, 2025

Uh oh!

mergify bot commented Jan 10, 2025

☑️ Nothing to do

Uh oh!

github-actions bot commented Jan 10, 2025

Uh oh!

github-actions bot commented Jan 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants