Commits
User selector
Commit History
Commits on Jul 3, 2025
Commits on Jan 8, 2025
fix: streaming resource lock (#1879)
Show description for e8f14ceandauthored
Commits on Dec 9, 2024
Commits on Dec 6, 2024
fix: Avoid thread starvation on many concurrent requests by making use of asyncio to lock llama_proxy context (#1798)
Show description for 9bd0c95andauthoredfix: added missing exit_stack.close() to /v1/chat/completions (#1796)
Show description for 073b7e4authored
Commits on Sep 20, 2024
Commits on Aug 29, 2024
Commits on Jul 17, 2024
fix(server): Use split_mode from model settings (#1594)
Show description for 66d5cddandauthored
Commits on Jul 9, 2024
- committed
Commits on Jul 2, 2024
- committed
- committed
- committed
Commits on Jun 13, 2024
feat: Add `.close()` method to `Llama` class to explicitly free model from memory (#1513)
Show description for 320a5d7andauthored
Commits on Jun 4, 2024
feat: adding `rpc_servers` parameter to `Llama` class (#1477)
Show description for d634efcandauthored
Commits on May 29, 2024
Commits on May 14, 2024
feat: add MinTokensLogitProcessor and min_tokens argument to server (#1333)
Show description for 5212fb0authored
Commits on May 5, 2024
Commits on May 2, 2024
Commits on Apr 30, 2024
Commits on Apr 26, 2024
Commits on Apr 23, 2024
Commits on Apr 17, 2024
feat: add `disable_ping_events` flag (#1257)
Show description for b73c73cauthored- authored andcommitted
Commits on Apr 10, 2024
Commits on Apr 1, 2024
feat: add support for KV cache quantization options (#1307)
Show description for f165048andauthored
Commits on Mar 31, 2024
feat: Add logprobs support to chat completions (#1311)
Show description for aa9f1aeandauthored
Commits on Mar 23, 2024
- committed
Commits on Mar 19, 2024
Commits on Mar 9, 2024
feat: Add endpoints for tokenize, detokenize and count tokens (#1136)
Show description for c139f8bandauthored
Commits on Feb 28, 2024
- committed
- committed