Trimming in chat results in open.InternalServerError under certain conditions

Describe the bug

What started as just trying to bump the version of llama_cpp_python (because of llama-cpp-python vulnerable to Remote Code Execution by Server-Side Template Injection in Model M...) turned into realizing that our trimming code in chat doesn't trim the way we'd expect it to anymore.

When max-ctx-size goes from 25 -> 55 , certain prompts throw a openai.InternalServerError even though both max-ctx-sizes get rounded up to 64 in llama_cpp. Below is a breakdown of when the trimming code in chat is executed with different prompts and two small max-ctx-sizes.

M1 = "Hello"
M2 = "Hello, joe was born in 2000. How old is joe"
M3 = "Hello, I am a ci message that should not finish because I am too long for the context window, tell me about your day please?"
M4 = "Hello, I am a ci message that should not finish because I am too long for the context window, tell me about your day please? How many tokens could you take today. Could you tell me about the time you could only take 55 tokens"
M5 = "I was born in 2000. How old am I?"

25 max context window (ilab serve --max-ctx-size 25):
ilab chat -qq M1 
    - BadRequestError: Trims message
    - Responds back fine
ilab chat -qq M2
    - BadRequestError: Trims message
    - Responds back fine
ilab chat -qq M3
    - BadRequestError: Trims message
    - BadRequestError: Final message too large for context size
ilab chat -qq M4
    - BadRequestError: Trims message
    - BadRequestError: Final message too large for context size
ilab chat -qq M5
    - BadRequestError: Trims message
    - Responds back fine

55 max context window (ilab serve --max-ctx-size 55):
ilab chat -qq M1
    - Responds back fine
ilab chat -qq M2
    - BadRequestError: Trims message
    - Responds back fine
ilab chat -qq M3
    - BadRequestError: Trims message
    - Responds back fine
ilab chat -qq M4
    - BadRequestError: Trims message
    - InternalServerError
ilab chat -qq M5
    - InternalServerError

To Reproduce
See description above

Expected behavior

BadrequestError's get thrown every time a trim happens in the situation where the max-ctx-window is 55 for M4 and M5. InternalServerError's should not get thrown when trimming occurs.

Device Info (please complete the following information):

OS Version: Fedora Linux 39
Python Version: Python 3.12.2
InstructLab Version: ilab, version 0.14.1.dev235

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trimming in chat results in open.InternalServerError under certain conditions #1206

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Search code, repositories, users, issues, pull requests...

Trimming in chat results in open.InternalServerError under certain conditions #1206

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions