Support defaulting to infinity or -1 for completions #111

swg · Apr 25, 2023

Hello,

As per llama.cpp help:

-n N, --n_predict N number of tokens to predict (default: 128, -1 = infinity)

The OpenAI docs state that the /completions endpoint defaults to 16, but is optional:
max_tokens integer Optional Defaults to 16
https://platform.openai.com/docs/api-reference/completions/create#completions/create-max_tokens

However, /chat/completions defaults to infinity, and is again optional:
max_tokens integer Optional Defaults to inf
https://platform.openai.com/docs/api-reference/chat/create#chat/create-max_tokens

This pull request modifies the fast_api example, the main server and supporting python lib to allow passing a 'null' which will default to infinity, mimicking the OpenAI API.

peturparkur · Dec 20, 2023

examples/high_level_api/fastapi_server.py

@@ -156,7 +156,7 @@ class CreateChatCompletionRequest(BaseModel):
    top_p: float = 0.95
    stream: bool = False
    stop: List[str] = []
-    max_tokens: int = 128
+    max_tokens: Optional[Union[int, None]] = -1


The type signature for max_tokens: Optional[Union[int, None]] doesn't make any sense.
Optional[T] is equivalent/alias of Union[T, None]

K-Mistele · Dec 21, 2023

Hey guys, seems like there's a related error in #983 where if you don't specify the max_tokens and then send a prompt larger than n_ctx you run into a TypeError due to server attempting to add the default value of max_tokens which is None to the number of tokens from the prompt.

abetlen · Dec 22, 2023

@K-Mistele yup, I added a few commits to hopefully fix that in the server error handler.

@swg thanks for the contribution! Sorry it took so long to merge.

K-Mistele · Dec 22, 2023

With these fixes what is now the proper way to set an unlimited number of response tokens?

abetlen · Dec 22, 2023

@K-Mistele passing None / null for max_tokens should work everywhere now and is recommended as it matches the OpenAI spec.

Support defaulting to infinity or -1 for chat completions

cbfc6d8

swg mentioned this pull request Apr 25, 2023

[Feature Request] llama-cpp-python Web Server api support ztjhz/BetterChatGPT#248

Open

gjmulder added the enhancement New feature or request label May 23, 2023

abetlen force-pushed the main branch 2 times, most recently from 8c93cf8 to cc0fe43 Compare November 14, 2023 20:24

peturparkur reviewed Dec 20, 2023

View reviewed changes

K-Mistele mentioned this pull request Dec 21, 2023

max_tokens is None leads to internal server error #983

Open

4 tasks

abetlen added 4 commits December 22, 2023 13:41

Check if completion_tokens is none in error handler.

99ff175

Merge branch 'main' into bugfix/support_n-predict_infinity

ea5c031

fix: max_tokens in create completion should match openai spec

0680878

Fix __call__

27856f9

abetlen changed the title ~~Support defaulting to infinity or -1 for chat completions~~ Support defaulting to infinity or -1 for completions Dec 22, 2023

abetlen merged commit 4b01a87 into abetlen:main Dec 22, 2023

thadude3 mentioned this pull request Mar 23, 2024

Unable to compile on Ubuntu 20 04 #1299

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support defaulting to infinity or -1 for completions #111

Support defaulting to infinity or -1 for completions #111

Uh oh!

swg commented Apr 25, 2023

Uh oh!

peturparkur Dec 20, 2023

Uh oh!

K-Mistele commented Dec 21, 2023

Uh oh!

abetlen commented Dec 22, 2023

Uh oh!

K-Mistele commented Dec 22, 2023

Uh oh!

abetlen commented Dec 22, 2023

Uh oh!

Uh oh!

Search code, repositories, users, issues, pull requests...

Support defaulting to infinity or -1 for completions #111

Support defaulting to infinity or -1 for completions #111

Uh oh!

Conversation

swg commented Apr 25, 2023

Uh oh!

peturparkur Dec 20, 2023

Choose a reason for hiding this comment

Uh oh!

K-Mistele commented Dec 21, 2023

Uh oh!

abetlen commented Dec 22, 2023

Uh oh!

K-Mistele commented Dec 22, 2023

Uh oh!

abetlen commented Dec 22, 2023

Uh oh!

Uh oh!