Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Trimming in chat results in open.InternalServerError under certain conditions #1206

Copy link
Copy link
@alimaredia

Description

@alimaredia
Issue body actions

Describe the bug

From #1161:

What started as just trying to bump the version of llama_cpp_python (because of llama-cpp-python vulnerable to Remote Code Execution by Server-Side Template Injection in Model M...) turned into realizing that our trimming code in chat doesn't trim the way we'd expect it to anymore.

When max-ctx-size goes from 25 -> 55 , certain prompts throw a openai.InternalServerError even though both max-ctx-sizes get rounded up to 64 in llama_cpp. Below is a breakdown of when the trimming code in chat is executed with different prompts and two small max-ctx-sizes.

M1 = "Hello"
M2 = "Hello, joe was born in 2000. How old is joe"
M3 = "Hello, I am a ci message that should not finish because I am too long for the context window, tell me about your day please?"
M4 = "Hello, I am a ci message that should not finish because I am too long for the context window, tell me about your day please? How many tokens could you take today. Could you tell me about the time you could only take 55 tokens"
M5 = "I was born in 2000. How old am I?"

25 max context window (ilab serve --max-ctx-size 25):
ilab chat -qq M1 
    - BadRequestError: Trims message
    - Responds back fine
ilab chat -qq M2
    - BadRequestError: Trims message
    - Responds back fine
ilab chat -qq M3
    - BadRequestError: Trims message
    - BadRequestError: Final message too large for context size
ilab chat -qq M4
    - BadRequestError: Trims message
    - BadRequestError: Final message too large for context size
ilab chat -qq M5
    - BadRequestError: Trims message
    - Responds back fine

55 max context window (ilab serve --max-ctx-size 55):
ilab chat -qq M1
    - Responds back fine
ilab chat -qq M2
    - BadRequestError: Trims message
    - Responds back fine
ilab chat -qq M3
    - BadRequestError: Trims message
    - Responds back fine
ilab chat -qq M4
    - BadRequestError: Trims message
    - InternalServerError
ilab chat -qq M5
    - InternalServerError

To Reproduce
See description above

Expected behavior

BadrequestError's get thrown every time a trim happens in the situation where the max-ctx-window is 55 for M4 and M5. InternalServerError's should not get thrown when trimming occurs.

Device Info (please complete the following information):

  • OS Version: Fedora Linux 39
  • Python Version: Python 3.12.2
  • InstructLab Version: ilab, version 0.14.1.dev235
Reactions are currently unavailable

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    Morty Proxy This is a proxified and sanitized view of the page, visit original site.