Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Add parameter to skip saving to cache when caching is enabled #594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
Loading
from

Conversation

shaunabanana
Copy link

For some use cases, esp. for apps using LLMs only for specific tasks, there might be a known list of few-shot examples with only the last one always changing according to user input. In such cases, caching the completions every time adds little boost to inference speed, and might actually slow down inference when used with slow memory/disk. Therefore, it is ideal to be able to cache those pre-defined of prompts at the start of the app, and then to skip caching in subsequent calls.

This PR adds a parameter (save_cache) to create_completion() and other related functions, as well as the completions API in the server. It is set to be True by default to avoid changing the default behavior, and is only used when caching is enabled.

@abetlen abetlen force-pushed the main branch 2 times, most recently from 8c93cf8 to cc0fe43 Compare November 14, 2023 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
Morty Proxy This is a proxified and sanitized view of the page, visit original site.