Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Support defaulting to infinity or -1 for completions #111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 22, 2023

Conversation

swg
Copy link
Contributor

@swg swg commented Apr 25, 2023

Hello,

As per llama.cpp help:

-n N, --n_predict N number of tokens to predict (default: 128, -1 = infinity)

The OpenAI docs state that the /completions endpoint defaults to 16, but is optional:
max_tokens integer Optional Defaults to 16
https://platform.openai.com/docs/api-reference/completions/create#completions/create-max_tokens

However, /chat/completions defaults to infinity, and is again optional:
max_tokens integer Optional Defaults to inf
https://platform.openai.com/docs/api-reference/chat/create#chat/create-max_tokens

This pull request modifies the fast_api example, the main server and supporting python lib to allow passing a 'null' which will default to infinity, mimicking the OpenAI API.

@@ -156,7 +156,7 @@ class CreateChatCompletionRequest(BaseModel):
top_p: float = 0.95
stream: bool = False
stop: List[str] = []
max_tokens: int = 128
max_tokens: Optional[Union[int, None]] = -1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type signature for max_tokens: Optional[Union[int, None]] doesn't make any sense.
Optional[T] is equivalent/alias of Union[T, None]

@K-Mistele
Copy link
Contributor

Hey guys, seems like there's a related error in #983 where if you don't specify the max_tokens and then send a prompt larger than n_ctx you run into a TypeError due to server attempting to add the default value of max_tokens which is None to the number of tokens from the prompt.

@abetlen abetlen changed the title Support defaulting to infinity or -1 for chat completions Support defaulting to infinity or -1 for completions Dec 22, 2023
@abetlen
Copy link
Owner

abetlen commented Dec 22, 2023

@K-Mistele yup, I added a few commits to hopefully fix that in the server error handler.

@swg thanks for the contribution! Sorry it took so long to merge.

@abetlen abetlen merged commit 4b01a87 into abetlen:main Dec 22, 2023
@K-Mistele
Copy link
Contributor

With these fixes what is now the proper way to set an unlimited number of response tokens?

@abetlen
Copy link
Owner

abetlen commented Dec 22, 2023

@K-Mistele passing None / null for max_tokens should work everywhere now and is recommended as it matches the OpenAI spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.