Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Comments

Close side panel

feat: support llama-cpp-python v0.3.2 (backport #2825)#2883

Merged
mergify[bot] merged 1 commit intorelease-v0.22instructlab/instructlab:release-v0.22from
mergify/bp/release-v0.22/pr-2825instructlab/instructlab:mergify/bp/release-v0.22/pr-2825Copy head branch name to clipboard
Jan 10, 2025
Merged

feat: support llama-cpp-python v0.3.2 (backport #2825)#2883
mergify[bot] merged 1 commit intorelease-v0.22instructlab/instructlab:release-v0.22from
mergify/bp/release-v0.22/pr-2825instructlab/instructlab:mergify/bp/release-v0.22/pr-2825Copy head branch name to clipboard

Conversation

@mergify
Copy link
Contributor

@mergify mergify bot commented Jan 8, 2025

version 0.3.5 of llama-cpp-python has a known issue abetlen/llama-cpp-python#1861 version 0.3.2 has granite 3.0 support and does not have this issue. Bump to this version

this required some additions to how we handle chat exceptions. As of these newer 0.3.z llama-cpp-python versions,
a bad request causes the server to die. This requires us to know the max_ctx_size of the server before passing a completions request so that
we can maintain the behavior of trimming messages until we can respond to one that fits.

in order to do this, the config now contains a current_max_ctx_size field that we will update when spinning up a server.
in the case that a user implicitly starts a llama-cpp-python server when calling ilab model chat, we set the max_tokens to the
current max_ctx_size in the serve config.

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the
    conventional commits.
  • Changelog updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Functional tests have been added, if necessary.
  • E2E Workflow tests have been added, if necessary.

This is an automatic backport of pull request #2825 done by [Mergify](https://mergify.com).

@mergify mergify bot mentioned this pull request Jan 8, 2025
6 tasks
@mergify mergify bot added CI/CD Affects CI/CD configuration container Affects containization aspects documentation Improvements or additions to documentation testing Relates to testing release-branch Pull Request directly to a release branch dependencies Relates to dependencies ci-failure PR has at least one CI failure labels Jan 8, 2025
@cdoern cdoern added the hold In-progress PR. Tag should be removed before merge. label Jan 9, 2025
@mergify mergify bot added the one-approval PR has one approval from a maintainer label Jan 9, 2025
@mergify mergify bot removed the one-approval PR has one approval from a maintainer label Jan 9, 2025
@cdoern cdoern force-pushed the mergify/bp/release-v0.22/pr-2825 branch 2 times, most recently from 57c4cd3 to 68b31b0 Compare January 9, 2025 20:55
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Jan 9, 2025
@cdoern
Copy link
Contributor

cdoern commented Jan 10, 2025

@Mergifyio rebase

@mergify
Copy link
Contributor Author

mergify bot commented Jan 10, 2025

rebase

☑️ Nothing to do

Details
  • -conflict [📌 rebase requirement]
  • -closed [📌 rebase requirement]
  • queue-position = -1 [📌 rebase requirement]
  • any of:
    • #commits-behind > 0 [📌 rebase requirement]
    • #commits > 1 [📌 rebase requirement]
    • -linear-history [📌 rebase requirement]

version 0.3.2 has granite 3.0 support and does not have this issue. Bump to this version

this required some additions to how we handle chat exceptions. As of these newer 0.3.z llama-cpp-python versions,
a bad request causes the server to die. This requires us to know the max_ctx_size of the server before passing a completions request so that
we can maintain the behavior of trimming messages until we can respond to one that fits.

in order to do this, the config now contains a `current_max_ctx_size` field that we will update when spinning up a server.
in the case that a user implicitly starts a llama-cpp-python server when calling `ilab model chat`, we set the max_tokens to the
current `max_ctx_size` in the serve config.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
(cherry picked from commit b29efdd)
Signed-off-by: Charlie Doern <cdoern@redhat.com>
@cdoern cdoern force-pushed the mergify/bp/release-v0.22/pr-2825 branch from 68b31b0 to 75d854c Compare January 10, 2025 03:34
@mergify mergify bot removed the ci-failure PR has at least one CI failure label Jan 10, 2025
@github-actions
Copy link

E2E (NVIDIA L40S x4) workflow launched on this PR: View run

@github-actions
Copy link

e2e workflow succeeded on this PR: View run, congrats!

@nathan-weinberg nathan-weinberg removed the hold In-progress PR. Tag should be removed before merge. label Jan 10, 2025
@mergify mergify bot merged commit 2268719 into release-v0.22 Jan 10, 2025
31 checks passed
@mergify mergify bot deleted the mergify/bp/release-v0.22/pr-2825 branch January 10, 2025 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD Affects CI/CD configuration container Affects containization aspects dependencies Relates to dependencies documentation Improvements or additions to documentation release-branch Pull Request directly to a release branch testing Relates to testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.