Update llama_cpp_python version to 0.2.75 by alimaredia · Pull Request #1161 · instructlab/instructlab

alimaredia · May 14, 2024

Changes

Which issue is resolved by this Pull Request:
Resolves # https://github.com/instructlab/instructlab/security/dependabot/1

Description of your changes:

Testing needs to be done on M3 Macs to ensure abetlen/llama-cpp-python#1286 doesn't still occur.

nathan-weinberg · May 14, 2024

@alimaredia can we fix the lint error in this as well? Not sure how that got in

alimaredia · May 14, 2024

@nathan-weinberg Wouldn't that impact the backport if we just want to backport requirements.txt. If so the linting should be addressed in a separate PR.

russellb · May 14, 2024

Ran the e2e job here: https://github.com/instructlab/instructlab/actions/runs/9087518967

russellb · May 14, 2024

Ran the e2e job here: https://github.com/instructlab/instructlab/actions/runs/9087518967

alimaredia · May 15, 2024

@nathan-weinberg #1162 fixes the linting issues. Once that is merged I can rebase my PR and the linting test should pass.

nathan-weinberg · May 15, 2024

@Mergifyio rebase

mergify · May 15, 2024

rebase

✅ Branch has been successfully rebased

nathan-weinberg · May 15, 2024

@Mergifyio rebase

mergify · May 15, 2024

rebase

✅ Branch has been successfully rebased

russellb · May 15, 2024

the functional test failure looks like a real problem

tiran · May 15, 2024

Could you try with a larger max ctx size? 4096 might be too small.

markstur · May 20, 2024

src/instructlab/chat/chat.py

@@ -391,6 +392,10 @@ def start_prompt(self, logger, content=None, box=True):
                            )
                            self.info["messages"].pop()


this is trimming the newest message and also recent llama-cpp versions don't seem to tolerate changing the message list size

Ignore the confusion about "newest" vs "latest" -- I read a comment out-of-place.

The main thing here though is that we still get InternalError happening when we try to shorten messages this way and I think we're going to make that a separate issue.

mergify · May 20, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @alimaredia please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

- Adjust test_ctx_size() - handlle openai.InternalServerError when chatting Signed-off-by: Ali Maredia <amaredia@redhat.com>

alimaredia · May 21, 2024

@russellb @markstur @tiran @nathan-weinberg

What started as just trying to bump the version of llama_cpp_python (because of https://github.com/instructlab/instructlab/security/dependabot/1) turned into realizing that our trimming code in chat doesn't trim the way we'd expect it to anymore.

When max-ctx-size goes from 25 -> 55 , certain prompts throw a openai.InternalServerError even though both max-ctx-sizes get rounded up to 64 in llama_cpp. Below is a breakdown of when the trimming code in chat is executed with different prompts and two small max-ctx-sizes.

For now I think just handling openapi.InternalServerError is a good starting point and a follow up issue should get created. What do you all think?

M1 = "Hello"
M2 = "Hello, joe was born in 2000. How old is joe"
M3 = "Hello, I am a ci message that should not finish because I am too long for the context window, tell me about your day please?"
M4 = "Hello, I am a ci message that should not finish because I am too long for the context window, tell me about your day please? How many tokens could you take today. Could you tell me about the time you could only take 55 tokens"
M5 = "I was born in 2000. How old am I?"

25 max context window (ilab serve --max-ctx-size 25):
ilab chat -qq M1 
    - BadRequestError: Trims message
    - Responds back fine
ilab chat -qq M2
    - BadRequestError: Trims message
    - Responds back fine
ilab chat -qq M3
    - BadRequestError: Trims message
    - BadRequestError: Final message too large for context size
ilab chat -qq M4
    - BadRequestError: Trims message
    - BadRequestError: Final message too large for context size
ilab chat -qq M5
    - BadRequestError: Trims message
    - Responds back fine

55 max context window (ilab serve --max-ctx-size 55):
ilab chat -qq M1
    - Responds back fine
ilab chat -qq M2
    - BadRequestError: Trims message
    - Responds back fine
ilab chat -qq M3
    - BadRequestError: Trims message
    - Responds back fine
ilab chat -qq M4
    - BadRequestError: Trims message
    - InternalServerError
ilab chat -qq M5
    - InternalServerError

alimaredia · May 21, 2024

this is trimming the newest message and also recent llama-cpp versions don't seem to tolerate changing the message list size

@markstur could you go into detail about this more or send me a reproducer for what you're talking about? That line should be trimming the last remaining message and then raising a KeyboardInterupt.

russellb

Thank you for your diligent efforts on this!

As discussed, there's more to dig into here to figure out why it's responding with an internal server error. Something unexpected is still happening on the server side. Please file an issue to track down the source of the internal server error at some point.

markstur · May 22, 2024

I think this works reasonably well when the context is 512 (maybe even 256). It's the real short context tests where the internal error is such a problem. I suspect llama-cpp-python needs to fix something with the batch size vs context size, but haven't been able to figure out why I only reproduce the problem with small contexts (so far).

We probably should merge this soon as the lesser of evils. Wondering about a few things though:

Can/should we just enforce a reasonable minimum max-ctx-size?
Shouldn't we always keep the first message (the prompt) when trimming? Need reasonable ctx size for that.

leseb · May 23, 2024

Can we move forward with this or are we still waiting for all the requested reviews? Thank

russellb · May 23, 2024

I’m going to let this merge. If anyone has follow ups let’s file an issue to make sure it doesn’t get lost.

nathan-weinberg · May 23, 2024

@Mergifyio backport release-v0.15

mergify · May 23, 2024

backport release-v0.15

✅ Backports have been created

Details

#1203 Update llama_cpp_python version to 0.2.75 (backport #1161) has been created for branch release-v0.15 but encountered conflicts

…-1161 Update llama_cpp_python version to 0.2.75 (backport #1161)

alimaredia requested review from bjhargrave, cdoern, nathan-weinberg, russellb and tiran May 14, 2024 22:49

nathan-weinberg force-pushed the update-llama-cpp-python-ver branch from 1f595e5 to 9c85588 Compare May 15, 2024 12:26

nathan-weinberg force-pushed the update-llama-cpp-python-ver branch from 9c85588 to f114766 Compare May 15, 2024 13:38

nathan-weinberg requested a review from a team May 15, 2024 17:11

mergify bot added testing Relates to testing ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels May 15, 2024

alimaredia force-pushed the update-llama-cpp-python-ver branch from edf937e to f114766 Compare May 16, 2024 14:13

mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels May 16, 2024

alimaredia force-pushed the update-llama-cpp-python-ver branch from 09c15a0 to b299786 Compare May 17, 2024 15:05

mergify bot added the ci-failure PR has at least one CI failure label May 17, 2024

nathan-weinberg added this to the Release - 5/30 milestone May 17, 2024

markstur reviewed May 20, 2024

View reviewed changes

alimaredia force-pushed the update-llama-cpp-python-ver branch from b0beb2b to da93591 Compare May 20, 2024 21:13

mergify bot added the needs-rebase This Pull Request needs to be rebased label May 20, 2024

alimaredia force-pushed the update-llama-cpp-python-ver branch from da93591 to 90feeb3 Compare May 21, 2024 02:17

mergify bot removed the needs-rebase This Pull Request needs to be rebased label May 21, 2024

alimaredia removed the hold In-progress PR. Tag should be removed before merge. label May 21, 2024

mergify bot added the ci-failure PR has at least one CI failure label May 21, 2024

Update llama_cpp_python version to 0.2.75

f24d0d7

- Adjust test_ctx_size() - handlle openai.InternalServerError when chatting Signed-off-by: Ali Maredia <amaredia@redhat.com>

alimaredia force-pushed the update-llama-cpp-python-ver branch from 90feeb3 to f24d0d7 Compare May 21, 2024 02:26

mergify bot removed the ci-failure PR has at least one CI failure label May 21, 2024

russellb approved these changes May 21, 2024

View reviewed changes

mergify bot added the one-approval PR has one approval from a maintainer label May 21, 2024

alimaredia mentioned this pull request May 22, 2024

DO-NOT-MERGE testing ctx limit #1190

Closed

nathan-weinberg approved these changes May 22, 2024

View reviewed changes

mergify bot removed the one-approval PR has one approval from a maintainer label May 22, 2024

russellb removed request for bjhargrave, cdoern and tiran May 23, 2024 12:28

mergify bot merged commit 4cdb6b4 into instructlab:main May 23, 2024

mergify bot mentioned this pull request May 23, 2024

Update llama_cpp_python version to 0.2.75 (backport #1161) #1203

Merged

mergify bot added a commit that referenced this pull request May 23, 2024

Merge pull request #1203 from instructlab/mergify/bp/release-v0.15/pr…

125420d

…-1161 Update llama_cpp_python version to 0.2.75 (backport #1161)

alimaredia mentioned this pull request May 23, 2024

Trimming in chat results in open.InternalServerError under certain conditions #1206

Closed

Search code, repositories, users, issues, pull requests...

Comments

Conversation

alimaredia commented May 14, 2024

Changes

Uh oh!

nathan-weinberg commented May 14, 2024

Uh oh!

alimaredia commented May 14, 2024

Uh oh!

russellb commented May 14, 2024

Uh oh!

russellb commented May 14, 2024

Uh oh!

alimaredia commented May 15, 2024

Uh oh!

nathan-weinberg commented May 15, 2024

Uh oh!

mergify bot commented May 15, 2024

✅ Branch has been successfully rebased

Uh oh!

nathan-weinberg commented May 15, 2024

Uh oh!

mergify bot commented May 15, 2024

✅ Branch has been successfully rebased

Uh oh!

russellb commented May 15, 2024

Uh oh!

tiran commented May 15, 2024

Uh oh!

markstur May 20, 2024

Choose a reason for hiding this comment

Uh oh!

markstur May 21, 2024

Choose a reason for hiding this comment

Uh oh!

mergify bot commented May 20, 2024

Uh oh!

alimaredia commented May 21, 2024

Uh oh!

alimaredia commented May 21, 2024

Uh oh!

russellb left a comment

Choose a reason for hiding this comment

Uh oh!

markstur commented May 22, 2024

Uh oh!

leseb commented May 23, 2024

Uh oh!

russellb commented May 23, 2024

Uh oh!

nathan-weinberg commented May 23, 2024

Uh oh!

mergify bot commented May 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Backports have been created

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

mergify bot commented May 23, 2024 •

edited

Loading