-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Support defaulting to infinity or -1 for completions #111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8c93cf8
to
cc0fe43
Compare
@@ -156,7 +156,7 @@ class CreateChatCompletionRequest(BaseModel): | ||
top_p: float = 0.95 | ||
stream: bool = False | ||
stop: List[str] = [] | ||
max_tokens: int = 128 | ||
max_tokens: Optional[Union[int, None]] = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type signature for max_tokens: Optional[Union[int, None]]
doesn't make any sense.
Optional[T]
is equivalent/alias of Union[T, None]
Hey guys, seems like there's a related error in #983 where if you don't specify the |
@K-Mistele yup, I added a few commits to hopefully fix that in the server error handler. @swg thanks for the contribution! Sorry it took so long to merge. |
With these fixes what is now the proper way to set an unlimited number of response tokens? |
@K-Mistele passing None / null for |
Hello,
As per llama.cpp help:
-n N, --n_predict N number of tokens to predict (default: 128, -1 = infinity)
The OpenAI docs state that the /completions endpoint defaults to 16, but is optional:
max_tokens integer Optional Defaults to 16
https://platform.openai.com/docs/api-reference/completions/create#completions/create-max_tokens
However, /chat/completions defaults to infinity, and is again optional:
max_tokens integer Optional Defaults to inf
https://platform.openai.com/docs/api-reference/chat/create#chat/create-max_tokens
This pull request modifies the fast_api example, the main server and supporting python lib to allow passing a 'null' which will default to infinity, mimicking the OpenAI API.