-
Notifications
You must be signed in to change notification settings - Fork 12k
Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars #9639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
375 commits
Select commit
Hold shift + click to select a range
ec9f3b1
nits
9a86ea7
`tool-call`: slow tool call integration tests
c88095e
space nits
7fde6d0
`tool_call`: test no tool call on a real model + rename scenarios
dd6d024
`tool-call`: script to prefetch models used in server tests
168add7
Update tool_call.feature
ec547e4
`tool-call`: add tests: tool_call=none, parallel_tool_calls=true
b51c71c
`tool-call`: remove duplicate script to fetch templates
74d71a6
`agent`: simplify syntax (default tools to local w/ default port)
b825440
`tool-call`: use Q4_K_M models
aefac1e
`tool-call`: update scripts/fetch_server_test_models.py
64287a3
`tool-call`: test Hermes-3-Llama-3.1-8B
fa4c111
`tool-call`: use functionary-small-v3.2-Q8_0.gguf in test (Q4_K_M too…
773ff91
`tool-call`: force printing of lazy grammar trigger tokens to regular…
92c384a
nits
3ebdb2b
`tool-call`: support tool_use variant in llama_chat_template_from_mod…
35ac17f
`tool-call`: fix missing initializer errors
5227321
`tool-call`: when slow server tests fail, hint to run `python scripts…
e4d5449
`tool-calls`: test Qwen2.5-7B-Instruct-Q4_K_M.gguf
61655b9
Merge remote-tracking branch 'origin/master' into tool-call
be9de3e
Update llama-sampling.cpp
542853b
`tool-call`: greedy sampling in server tests + tweak prompt
7d9c90f
`tool-call`: nemo tweak (accept raw sql again)
e8d9d71
Update tool_call.feature
c395d48
`tool-call`: behaviour-based detection of template features
f5b7825
`tool-call`: code_interpreter & system + tool call support for all ji…
c773516
`tool-call`: don't use -fa w/ Mistral-Nemo (hard crashes?)
b35aa4a
`tool-call`: add LLAMA_UPDATE_GOLDENS env for test-chat-template
9477c54
`tool-call`: functionary-small-v3.2 test now green
c4a8050
Update README.md
f5f7475
nits
fe967b6
Update README.md
479c152
`tool-call`: fix qwen template test
bc52c0a
`agent`: add missing tool name in response!
c059aec
`agent`: memorize, search_memory (sqlite-vec + sqlite-lembed), fetch …
5789f69
`minja`: don't explode upon referencing a field on an array (fixes He…
f9b1969
Update README.md
adc673c
agent: add --think "tool", default to local tools endpoint, support -…
1afa312
Merge remote-tracking branch 'origin/master' into tool-call
30fbcb2
agent: more robust squid config
a469f53
agent: update readme
cbe395d
minja: remove tests (now in https://github.com/google/minja)
1fd5f1a
Update README.md
5d0033f
minja: sync @ https://github.com/google/minja/commit/916c181c0d4a6f96…
1f0b157
tool-call: add firefunction-v2 style
93a5245
tool-calls: migrate tests to pytest
055053c
Merge remote-tracking branch 'origin/master' into tool-call
1e2115f
tool-calls: shorter name: grammar_triggers
7bfcd0a
Merge remote-tracking branch 'origin/master' into tool-call
7e3feff
tool-call: stabilize server tests
e70ce3f
Merge remote-tracking branch 'origin/master' into tool-call
f0bd693
Update test-tool-call.cpp
f645887
Update minja.hpp https://github.com/google/minja/commit/202aa2f3de21b…
0e87ae2
rm trailing spaces
0a5d527
Update fetch_server_test_models.py
a2fe8a4
Fix tool-call server tests
523ebf8
Simplify tool call grammars when there's only 1 tool
abd274a
Copy minja from https://github.com/google/minja/commit/58f0ca6dd74bcb…
e5113e8
Add --jinja and --chat-template-file flags
80138d9
Add missing <optional> include
06b5159
Avoid print in get_hf_chat_template.py
ce48584
No designated initializers yet
389d79b
Try and work around msvc++ non-macro max resolution quirk
238b968
Update test_chat_completion.py
cb72cf1
Merge remote-tracking branch 'origin/master' into jinja
78861a3
Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template
1aac99a
Refactor test-chat-template
7c84ebc
Test templates w/ minja
18f257b
Fix deprecation
8dd4f33
Add --jinja to llama-run
c04c50e
Merge remote-tracking branch 'origin/master' into jinja
a6afb27
Update common_chat_format_example to use minja template wrapper
b4083e4
Test chat_template in e2e test
b7e2171
Update utils.py
a57bb94
Update test_chat_completion.py
4daae0b
Update run.cpp
1b3bb7e
Update arg.cpp
ochafik e7ff6ec
Merge branch 'jinja' into tool-call
7a7d6f6
Fix merge
e183fa9
Update test-chat-template.cpp
010726c
Merge remote-tracking branch 'origin/master' into tool-call
d47f40c
Update test-chat-template.cpp
3ed670b
Merge remote-tracking branch 'origin/master' into jinja
3c7784c
Refactor common_chat_* functions to accept minja template + use_jinja…
b75d062
Refactor common_chat_* functions to accept minja template + use_jinja…
40db789
Merge remote-tracking branch 'origin/master' into jinja
81c0d43
Attempt to fix linkage of LLAMA_CHATML_TEMPLATE
138a4ba
Merge branch 'jinja' into tool-call
d5fa351
Revert LLAMA_CHATML_TEMPLATE refactor
045edd1
Merge branch 'jinja' into tool-call
2ceabee
Fix fetch_server_test_models.py (avoid conv trap)
259d9e4
tools: greedy sampling in tests
acf7c24
tools: run tool call slow tests when SLOW_TESTS=1 (+ prefetch models)
ee1e10e
Normalize newlines in test-chat-templates for windows tests
e63520f
Forward decl minja::chat_template to avoid eager json dep
33322e8
Flush stdout in chat template before potential crash
5074e6f
Fix copy elision warning
76893f5
Merge branch 'jinja' into tool-call
fc60802
Rm unused optional include
0e74c9d
Add missing optional include to server.cpp
d6f058d
Merge branch 'jinja' into tool-call
e3c475c
Disable jinja test that has a cryptic windows failure
cc50356
minja: fix vigogne (https://github.com/google/minja/pull/22)
c207fdc
Merge branch 'jinja' into tool-call
0401a83
agent: add --greedy, --top-p, --top-k options
153e852
Apply suggestions from code review
ochafik db9dd0c
Finish suggested renamings
c9e8fdd
Move chat_templates inside server_context + remove mutex
8c84aef
Update --chat-template-file w/ recent change to --chat-template
154bfaa
Refactor chat template validation
099f983
Merge remote-tracking branch 'origin/master' into jinja
54a669e
Guard against missing eos/bos tokens (null token otherwise throws in …
8348c60
Warn against missing eos / bos tokens when jinja template references …
ee475d2
rename: common_chat_template[s]
8a7c89e
reinstate assert on chat_templates.template_default
9bab693
Merge branch 'jinja' into tool-call
b110374
apply renames from jinja branch
8347da9
Update minja to https://github.com/google/minja/commit/b8437df626ac6c…
7ea6a06
Merge branch 'jinja' into tool-call
56aa93c
fix std imports for gcc build
ff2cce5
Update minja to https://github.com/google/minja/pull/25
ba8dd66
Merge branch 'jinja' into tool-call
9d8ebd6
Update minja from https://github.com/google/minja/pull/27
c606255
Merge branch 'jinja' into tool-call
fec0260
Merge remote-tracking branch 'origin/master' into tool-call
b49d052
rm tests/test-minja from makefile
f6e73da
Remove examples/agent (moved to https://gist.github.com/ochafik/9246d…
77f4098
Delete update_jinja_goldens.py
dbf841b
Push laziness down to grammar impl
ef61a4c
minimize diffs
3972945
common_tool_call rename
d77fecc
shrink diff in json conversion code
5268ec8
Refactor string helpers into common
9e8b43f
follow enum naming style for tool call styles
9a5acbb
Factor string_join, string_split, string_repeat into common
4de5cf8
json: refactor to surface a versatile builder
03fe80f
drop unused fs_list_files
41a613b
Merge branch 'string_utils' into tool-call
5140d7a
Update common.cpp
e211629
Merge branch 'string_utils' into tool-call
28cac49
drop llama_sampler_accept_str
2dd09c7
more cleanups
01b345b
Merge remote-tracking branch 'origin/master' into tool-call
82b6e9a
merge common_tool_calls into common_chat_msg
63387c6
smaller diff
a422636
nits
cce1166
Update tool-call.cpp
c6a22ed
Greedy sampling in tool call tests
30d33d9
Update test_chat_completion.py
9ccc62b
Sync minja after https://github.com/google/minja/pull/29
d186721
Merge remote-tracking branch 'origin/master' into tool-call
f0231a5
fix common_chat_msg invocations
5e358ad
fix msg init warning
cdfa8b9
Update chat-template.hpp
a46de6a
Add grammar options + rename builder to common_grammar_builder
c2d836f
Update real tool call tests (use less models)
46415d7
Fix lazy trigger handling
36ed106
WIP chat handlers
c479d39
tool-call: allow special tokens that are grammar triggers
0208b20
Update test_chat_completion.py
a6463c1
jinja: don't add bos when jinja enabled
51b7aab
Update test_chat_completion.py
3f3fc03
nit: trailing spaces
1159455
Merge branch 'tool-call' into tool-call-handler
43385b2
sync: minja
5ec4c5e
reshuffle chat handlers
f7078ca
tool-call: fix functionary v3.1 required test
ca0c837
nits
bddc1be
tool-call: fix special handling of special trigger tokens (Nemo)
da606d8
tool-call: remove nonsensical code_interpreter code
15ec01e
jinja: only add special tokens if template doesn't seem to handle them
2efa0c2
tool-call: add weather tool e2e tests
57f40e3
tool-call: fix lazy grammar & mixed content + tool calls parsing
6770955
tool-call: compact json output to cap # tokens generated
09971e6
Update test_chat_completion.py
92ac336
Prepare DeepSeek-R1-Distill-Llama-8B support
118f799
DeepSeek-R1: implement grammar constraints
add9124
fix test-chat-handler grammar tests
fa065eb
Rehabilitate test_format_detection
ad22978
updated tool call example to be less ambiguous (deepseek likes to ran…
90effb8
Pass grammar laziness all the way down to sampler (need to print spec…
cafea60
Split e2e test_tool_call from test_chat_completion
b565ab2
comment out broken tests in test_tool_call.py
2d607f1
Update test-chat-handler.cpp
ef9efc9
Fix Llama 3.1 (incl. constrained builtin tools e.g. `<|python_tag|>fo…
6271714
Allow tool use + streaming
6d56829
Cleanup dead code in llama_3_1 tool call code
2f99236
Tool-call: do last partial parse upon limit stop
0a51e51
Update test-chat-handler.cpp
d274ffc
build: Add missing optional include for gcc
62d45a5
Disable slow tests where appropriate, + nits
ec4aeaf
Revert "Allow tool use + streaming"
b5a74d1
Simplify parser defs (incremental parsing for streaming will need mor…
ba10b47
Add missing link dep for windows build
cd63ba4
beef up test-chat-handler w/ delta expectations
cad1448
Disable test-chat-handler on win32 like the other grammar-related tests
4f25755
minja: sync on https://github.com/google/minja/pull/33
d603d06
sync: minja
6426391
Fix firefunction w/ jinja: requires two variables, use the chat handl…
4cdbb8c
Revert breaking minja change
47be437
Text fireworks v2 template
18d5a1b
nits
4a1e8e9
refactor test-chat-handler
923c805
rm dead code + nits
384f54a
Split bulk of tool call tests to slow lane
40cc3f2
Merge branch 'tool-call' of github.com:ochafik/llama.cpp into tool-call
41eec46
rm unused templates, rename one
76f6ab1
Update test_tool_call.py
77dd67c
tool-calls: disable crashing tests
0f8af53
nits
babdefc
Merge remote-tracking branch 'origin/master' into tool-call
682026f
Create meta-llama-Llama-3.1-8B-Instruct.jinja
7b5e080
Move templates/ under models/
ba27e98
Unify llama 3.x chat handling again (allow `{"type": "function", "nam…
6e676c8
sync: minja
ed7c622
Rename: common/chat.*, common_chat_{inputs -> params}
36c776f
Finish renaming of chat inputs vs. params [skip ci]
bc8a611
nits
84bc083
Remove server tests LLAMA_CACHE override (tests are serial, and the c…
2b24569
Add cli mode to test-chat to generate template summaries markdown
64545ac
Somehow /* bad inside block comments, ok fine.
cbecb35
Add tool call to hot topics
a810c37
Partial revert of LLAMA_CACHE=tmp (unless set explicitly in env)
77c60e6
Avoid passing tools twice in generic handler (now that minja passes t…
d86a1ae
Unify content + message in server_task_result_cmpl_final (+ avoid str…
774557c
llama 3.1: allow `{name:` & `{function:` syntax even w/ builtin tools…
590c979
Update tests readme + add raw output to verbose log
f8e14bf
split chat handler vs. parser around enum again
81547e6
nits
18450e6
debug logs are back
b831a6e
rm unused llama_param
7635912
llama 3.2 1b now fails the weather tool call?
9591af1
increase http timeout to 12
8ef37a3
Merge remote-tracking branch 'origin/master' into tool-call
2d51c45
code style changes on test
ngxson c88f4a7
simplify handle_apply_template
ngxson 3dcde9e
Fix debug + verbose
06c4ca5
Update test_chat_completion.py
0c171f5
Update test_chat_completion.py
9685043
Update scripts/fetch_server_test_models.py to new compact hf_repo syn…
2bb3fed
nit: fix py import
7d59bf4
deprecate llama_sampler_init_grammar -> llama_sampler_grammar_init
5a64af6
add llama_sampler_init_grammar_lazy instead of renaming the non-lazy
f223df0
Format test-chat.cpp
8205246
log prompt + nits
5add261
test: leave model_hf_file blank
ngxson 1029ff9
force printing </tool_call> on hermes 2 model if/as it's a special token
3bd6abe
try and avoid weird server test failure (spillage / parallelism betwe…
729d2d3
Disable chat_completion tests of non-tool jinja mode
34f54dd
Fix typo
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Finish renaming of chat inputs vs. params [skip ci]
- Loading branch information
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ochafik marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.