Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Comments

Close side panel

Update llama_cpp_python version to 0.2.75#1161

Merged
mergify[bot] merged 1 commit intoinstructlab:maininstructlab/instructlab:mainfrom
alimaredia:update-llama-cpp-python-veralimaredia/instructlab:update-llama-cpp-python-verCopy head branch name to clipboard
May 23, 2024
Merged

Update llama_cpp_python version to 0.2.75#1161
mergify[bot] merged 1 commit intoinstructlab:maininstructlab/instructlab:mainfrom
alimaredia:update-llama-cpp-python-veralimaredia/instructlab:update-llama-cpp-python-verCopy head branch name to clipboard

Conversation

@alimaredia
Copy link
Contributor

Changes

Which issue is resolved by this Pull Request:
Resolves # https://github.com/instructlab/instructlab/security/dependabot/1

Description of your changes:

Testing needs to be done on M3 Macs to ensure abetlen/llama-cpp-python#1286 doesn't still occur.

@nathan-weinberg
Copy link
Member

@alimaredia can we fix the lint error in this as well? Not sure how that got in

@alimaredia
Copy link
Contributor Author

@nathan-weinberg Wouldn't that impact the backport if we just want to backport requirements.txt. If so the linting should be addressed in a separate PR.

@russellb
Copy link
Contributor

1 similar comment
@russellb
Copy link
Contributor

@alimaredia
Copy link
Contributor Author

@nathan-weinberg #1162 fixes the linting issues. Once that is merged I can rebase my PR and the linting test should pass.

@nathan-weinberg
Copy link
Member

@Mergifyio rebase

@mergify
Copy link
Contributor

mergify bot commented May 15, 2024

rebase

✅ Branch has been successfully rebased

@nathan-weinberg nathan-weinberg force-pushed the update-llama-cpp-python-ver branch from 1f595e5 to 9c85588 Compare May 15, 2024 12:26
@nathan-weinberg
Copy link
Member

@Mergifyio rebase

@mergify
Copy link
Contributor

mergify bot commented May 15, 2024

rebase

✅ Branch has been successfully rebased

@nathan-weinberg nathan-weinberg force-pushed the update-llama-cpp-python-ver branch from 9c85588 to f114766 Compare May 15, 2024 13:38
@russellb
Copy link
Contributor

the functional test failure looks like a real problem

@tiran
Copy link
Contributor

tiran commented May 15, 2024

Could you try with a larger max ctx size? 4096 might be too small.

@nathan-weinberg nathan-weinberg requested a review from a team May 15, 2024 17:11
@mergify mergify bot added testing Relates to testing ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels May 15, 2024
@alimaredia alimaredia force-pushed the update-llama-cpp-python-ver branch from edf937e to f114766 Compare May 16, 2024 14:13
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels May 16, 2024
@alimaredia alimaredia force-pushed the update-llama-cpp-python-ver branch from 09c15a0 to b299786 Compare May 17, 2024 15:05
@mergify mergify bot added the ci-failure PR has at least one CI failure label May 17, 2024
@nathan-weinberg nathan-weinberg added this to the Release - 5/30 milestone May 17, 2024
@@ -391,6 +392,10 @@ def start_prompt(self, logger, content=None, box=True):
)
self.info["messages"].pop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is trimming the newest message and also recent llama-cpp versions don't seem to tolerate changing the message list size

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore the confusion about "newest" vs "latest" -- I read a comment out-of-place.

The main thing here though is that we still get InternalError happening when we try to shorten messages this way and I think we're going to make that a separate issue.

@alimaredia alimaredia force-pushed the update-llama-cpp-python-ver branch from b0beb2b to da93591 Compare May 20, 2024 21:13
@mergify mergify bot added the needs-rebase This Pull Request needs to be rebased label May 20, 2024
@mergify
Copy link
Contributor

mergify bot commented May 20, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @alimaredia please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@alimaredia alimaredia force-pushed the update-llama-cpp-python-ver branch from da93591 to 90feeb3 Compare May 21, 2024 02:17
@mergify mergify bot removed the needs-rebase This Pull Request needs to be rebased label May 21, 2024
@alimaredia alimaredia removed the hold In-progress PR. Tag should be removed before merge. label May 21, 2024
@mergify mergify bot added the ci-failure PR has at least one CI failure label May 21, 2024
- Adjust test_ctx_size()
- handlle openai.InternalServerError when chatting

Signed-off-by: Ali Maredia <amaredia@redhat.com>
@alimaredia alimaredia force-pushed the update-llama-cpp-python-ver branch from 90feeb3 to f24d0d7 Compare May 21, 2024 02:26
@mergify mergify bot removed the ci-failure PR has at least one CI failure label May 21, 2024
@alimaredia
Copy link
Contributor Author

@russellb @markstur @tiran @nathan-weinberg

What started as just trying to bump the version of llama_cpp_python (because of https://github.com/instructlab/instructlab/security/dependabot/1) turned into realizing that our trimming code in chat doesn't trim the way we'd expect it to anymore.

When max-ctx-size goes from 25 -> 55 , certain prompts throw a openai.InternalServerError even though both max-ctx-sizes get rounded up to 64 in llama_cpp. Below is a breakdown of when the trimming code in chat is executed with different prompts and two small max-ctx-sizes.

For now I think just handling openapi.InternalServerError is a good starting point and a follow up issue should get created. What do you all think?

M1 = "Hello"
M2 = "Hello, joe was born in 2000. How old is joe"
M3 = "Hello, I am a ci message that should not finish because I am too long for the context window, tell me about your day please?"
M4 = "Hello, I am a ci message that should not finish because I am too long for the context window, tell me about your day please? How many tokens could you take today. Could you tell me about the time you could only take 55 tokens"
M5 = "I was born in 2000. How old am I?"

25 max context window (ilab serve --max-ctx-size 25):
ilab chat -qq M1 
    - BadRequestError: Trims message
    - Responds back fine
ilab chat -qq M2
    - BadRequestError: Trims message
    - Responds back fine
ilab chat -qq M3
    - BadRequestError: Trims message
    - BadRequestError: Final message too large for context size
ilab chat -qq M4
    - BadRequestError: Trims message
    - BadRequestError: Final message too large for context size
ilab chat -qq M5
    - BadRequestError: Trims message
    - Responds back fine

55 max context window (ilab serve --max-ctx-size 55):
ilab chat -qq M1
    - Responds back fine
ilab chat -qq M2
    - BadRequestError: Trims message
    - Responds back fine
ilab chat -qq M3
    - BadRequestError: Trims message
    - Responds back fine
ilab chat -qq M4
    - BadRequestError: Trims message
    - InternalServerError
ilab chat -qq M5
    - InternalServerError

@alimaredia
Copy link
Contributor Author

this is trimming the newest message and also recent llama-cpp versions don't seem to tolerate changing the message list size

@markstur could you go into detail about this more or send me a reproducer for what you're talking about? That line should be trimming the last remaining message and then raising a KeyboardInterupt.

Copy link
Contributor

@russellb russellb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your diligent efforts on this!

As discussed, there's more to dig into here to figure out why it's responding with an internal server error. Something unexpected is still happening on the server side. Please file an issue to track down the source of the internal server error at some point.

@mergify mergify bot added the one-approval PR has one approval from a maintainer label May 21, 2024
@markstur
Copy link
Contributor

I think this works reasonably well when the context is 512 (maybe even 256). It's the real short context tests where the internal error is such a problem. I suspect llama-cpp-python needs to fix something with the batch size vs context size, but haven't been able to figure out why I only reproduce the problem with small contexts (so far).

We probably should merge this soon as the lesser of evils. Wondering about a few things though:

  1. Can/should we just enforce a reasonable minimum max-ctx-size?
  2. Shouldn't we always keep the first message (the prompt) when trimming? Need reasonable ctx size for that.

@mergify mergify bot removed the one-approval PR has one approval from a maintainer label May 22, 2024
@leseb
Copy link
Contributor

leseb commented May 23, 2024

Can we move forward with this or are we still waiting for all the requested reviews? Thank

@russellb
Copy link
Contributor

I’m going to let this merge. If anyone has follow ups let’s file an issue to make sure it doesn’t get lost.

@mergify mergify bot merged commit 4cdb6b4 into instructlab:main May 23, 2024
@nathan-weinberg
Copy link
Member

@Mergifyio backport release-v0.15

@mergify
Copy link
Contributor

mergify bot commented May 23, 2024

backport release-v0.15

✅ Backports have been created

Details

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Relates to testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.