Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Add special token modification capability #6778

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 142 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
142 commits
Select commit Hold shift + click to select a range
9e4968c
Add special token modification capability
CISC Apr 20, 2024
8d36967
improve help text
CISC Apr 20, 2024
c4e6f6f
flake--
CISC Apr 20, 2024
a2410b6
fix multiple tokens warning
CISC Apr 20, 2024
aed82f6
common : try to fix Android CI (#6780)
ggerganov Apr 20, 2024
b8109bc
doc : server tests require llama to be built with curl enabled (#6788)
kaetemi Apr 20, 2024
e5956f5
make script executable
CISC Apr 21, 2024
ff5d21e
switch to namedtuple, no need to dataclass
CISC Apr 21, 2024
b97bc39
llama : support Llama 3 HF conversion (#6745)
pcuenca Apr 21, 2024
89b0bf0
llava : use logger in llava-cli (#6797)
jart Apr 21, 2024
2cca09d
readme : add Fedora instructions (#6783)
Man2Dev Apr 21, 2024
e8d35f4
doc : add link to falcon (#6789)
kaetemi Apr 21, 2024
c1386c9
gguf-py : add IQ1_M to GGML_QUANT_SIZES (#6761)
pmysl Apr 21, 2024
7dbdba5
llama : add llama-3 chat template (#6751)
DifferentialityDevelopment Apr 21, 2024
b9cc76d
ggml : fix ggml_backend_cpu_supports_op() for CPY (#0)
ggerganov Apr 21, 2024
40f74e4
llama : add option to render special/control tokens (#6807)
ggerganov Apr 21, 2024
5cf5e7d
`build`: generate hex dump of server assets during build (#6661)
ochafik Apr 21, 2024
e9b4a1b
flake.lock: Update
github-actions[bot] Apr 21, 2024
c0956b0
ci: fix job are cancelling each other (#6781)
phymbert Apr 22, 2024
8960fe8
llama : fix typo in <|im_end|> token text (#6745)
ggerganov Apr 22, 2024
e931888
ggml : fix calloc argument ordering. (#6820)
airlied Apr 22, 2024
192090b
llamafile : improve sgemm.cpp (#6796)
jart Apr 22, 2024
4e96a81
[SYCL] Windows default build instructions without -DLLAMA_SYCL_F16 fl…
aahouzi Apr 23, 2024
c8297c6
llama : add phi3 support (#6852)
liuwei-git Apr 24, 2024
3fec68b
convert : add support of codeqwen due to tokenizer (#6707)
JustinLin610 Apr 24, 2024
abd3314
llama : add phi 3 chat template (#6857)
tristandruyen Apr 24, 2024
c0d1b3e
ggml : move 32-bit arm compat in ggml-impl.h (#6865)
ggerganov Apr 24, 2024
28103f4
Server: fix seed for multiple slots (#6835)
JohannesGaessler Apr 24, 2024
37246b1
common : revert showing control tokens by default for server (#6860)
K-Mistele Apr 24, 2024
3fe847b
server : do not apply Markdown formatting in code sections (#6850)
mgroeber9110 Apr 24, 2024
b4e4b8a
llama : add llama_get_pooling_type function (#6862)
iamlemec Apr 24, 2024
784e11d
README: add graphic for matrix multiplication (#6881)
JohannesGaessler Apr 24, 2024
1966eb2
quantize : add '--keep-split' to quantize model into shards (#6688)
zj040045 Apr 25, 2024
aa750c1
tests : minor bash stuff (#6902)
ggerganov Apr 25, 2024
5477041
ggml : fix MIN / MAX macros (#6904)
ggerganov Apr 25, 2024
4ab99d8
clip : rename lerp function to avoid conflict (#6894)
danbev Apr 25, 2024
5154372
ggml : fix redefinition of vaddvq_f32 for 32-bit ARM (#6906)
ggerganov Apr 25, 2024
0ead1f1
llama : check that all the tensor data is in the model file (#6885)
slaren Apr 25, 2024
3fe0596
readme : update model list (#6908)
BarfingLemurs Apr 25, 2024
853d06f
ci : tmp disable slow tests
ggerganov Apr 25, 2024
d6e1d44
llama : synchronize before get/set session data (#6911)
slaren Apr 25, 2024
fa0b4ad
cmake : remove obsolete ANDROID check
ggerganov Apr 25, 2024
dba497e
cmake : restore LLAMA_LLAMAFILE_DEFAULT
ggerganov Apr 25, 2024
46e12c4
llava : add support for moondream vision language model (#6899)
vikhyat Apr 25, 2024
5790c8d
bench: server add stop word for PHI-2 (#6916)
phymbert Apr 26, 2024
7d641c2
ci: fix concurrency for pull_request_target (#6917)
phymbert Apr 26, 2024
d4a9afc
ci: server: fix python installation (#6918)
phymbert Apr 26, 2024
83b72cb
Merge pull request from GHSA-p5mv-gjc5-mwqv
ggerganov Apr 26, 2024
9e4e077
ci: server: fix python installation (#6922)
phymbert Apr 26, 2024
7f5ff55
server: stop generation at `n_ctx_train` if `n_predict` is not set (#…
phymbert Apr 26, 2024
bbe3c6e
ci: server: fix python installation (#6925)
phymbert Apr 26, 2024
4b1c3c9
llamafile : use 64-bit integers in sgemm (#6928)
jart Apr 26, 2024
e2764cd
gguf : fix mismatch between alloc and free functions (#6929)
slaren Apr 26, 2024
017e699
add basic tensor data validation function (#6884)
slaren Apr 26, 2024
0c4d489
quantize: add imatrix and dataset metadata in GGUF (#6658)
phymbert Apr 26, 2024
928e0b7
Reset schedule earlier to allow overlap with ggml graph computation o…
agray3 Apr 26, 2024
b736833
ci: server: tests python env on github container ubuntu latest / fix …
phymbert Apr 27, 2024
4dba7e8
Replace "alternative" boolean operator in conditional compilation dir…
mgroeber9110 Apr 27, 2024
6e472f5
flake.lock: Update
github-actions[bot] Apr 28, 2024
ce023f6
add device version in device list (#6959)
arthw Apr 28, 2024
7bb36cc
gguf : enforce that tensor names are unique (#6905)
ngxson Apr 28, 2024
e00b4a8
Fix more int overflow during quant (PPL/CUDA). (#6563)
dranger003 Apr 28, 2024
c4f708a
llama : fix typo LAMMAFILE -> LLAMAFILE (#6974)
JohannesGaessler Apr 29, 2024
ca7f29f
ci : add building in MSYS2 environments (Windows) (#6967)
przemoc Apr 29, 2024
577277f
make : change GNU make default CXX from g++ to c++ (#6966)
przemoc Apr 29, 2024
3055a41
convert : fix conversion of some BERT embedding models (#6937)
christianazinn Apr 29, 2024
3f16747
sampling : use std::random_device{}() for default random seed (#6962)
dwrensha Apr 29, 2024
f4ab2a4
llama : fix BPE pre-tokenization (#6920)
ggerganov Apr 29, 2024
24affa7
readme : update hot topics
ggerganov Apr 29, 2024
ffe6665
llava-cli : multiple images (#6969)
cpumaxx Apr 29, 2024
544f1f1
ggml : fix __MSC_VER -> _MSC_VER (#6977)
ggerganov Apr 29, 2024
d2c898f
ci : tmp disable gguf-split (#6983)
ggerganov Apr 29, 2024
b8a7a5a
build(cmake): simplify instructions (`cmake -B build && cmake --build…
ochafik Apr 29, 2024
5539e6f
main : fix typo in comment in main.cpp (#6985)
danbev Apr 29, 2024
b8c1476
Extending grammar integration tests (#6644)
HanClinto Apr 29, 2024
8843a98
Improve usability of --model-url & related flags (#6930)
ochafik Apr 29, 2024
952d03d
convert : use utf8 encoding (#7000)
ggerganov Apr 30, 2024
9c67c27
ggml : add Flash Attention (#5021)
ggerganov Apr 30, 2024
a68a1e7
metal : log more info on error (#6987)
bakkot Apr 30, 2024
77e15be
metal : remove deprecated error code (#7008)
ggerganov Apr 30, 2024
f364eb6
switch to using localizedDescription (#7010)
bakkot Apr 30, 2024
a8f9b07
perplexity: more statistics, added documentation (#6936)
JohannesGaessler Apr 30, 2024
c4ec9c0
ci : exempt confirmed bugs from being tagged as stale (#7014)
slaren May 1, 2024
1613ef8
CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (#7019)
JohannesGaessler May 1, 2024
3ea0d36
Server: add tests for batch size, different seeds (#6950)
JohannesGaessler May 1, 2024
8d608a8
main : fix off by one error for context shift (#6921)
l3utterfly May 1, 2024
b0d943d
Update LOG_IMPL and LOG_TEE_IMPL (#7029)
a-downing May 1, 2024
6ecf318
chore: fix typo in llama.cpp (#7032)
alwqx May 2, 2024
60325fa
Remove .attention from skipped tensors to match more accurately (#7051)
bartowski1182 May 2, 2024
433def2
llama : rename ctx to user_data in progress_callback (#7045)
danbev May 3, 2024
a2ac89d
convert.py : add python logging instead of print() (#6511)
mofosyne May 3, 2024
92139b9
tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)
ggerganov May 4, 2024
03fb8a0
If first token generated from the server is the stop word the server …
maor-ps May 4, 2024
fcd84a0
Fix Linux /sys cpu path to guess number of cores (#7064)
viric May 4, 2024
cf768b7
Tidy Android Instructions README.md (#7016)
Jeximo May 4, 2024
8425001
gguf-split: add --no-tensor-first-split (#7072)
ngxson May 4, 2024
d39f203
typing++
CISC May 4, 2024
158215c
add progress bar
CISC May 4, 2024
6fbd432
py : logging and flake8 suppression refactoring (#7081)
mofosyne May 5, 2024
889bdd7
command-r : add BPE pre-tokenization (#7063)
dranger003 May 5, 2024
ca36326
readme : add note that LLaMA 3 is not supported with convert.py (#7065)
lyledean1 May 5, 2024
8f8acc8
Disable benchmark on forked repo (#7034)
CISC May 5, 2024
628b299
Adding support for the --numa argument for llama-bench. (#7080)
kunnis May 5, 2024
bcdee0d
minor : fix trailing whitespace
ggerganov May 6, 2024
b3a995b
flake.lock: Update (#7079)
ggerganov May 6, 2024
858f6b7
Add an option to build without CUDA VMM (#7067)
WilliamTambellini May 6, 2024
947d3ad
ci : add GG_BUILD_EXTRA_TESTS_0 env (#7098)
ggerganov May 7, 2024
04976db
docs: fix typos (#7124)
omahs May 7, 2024
3af34c1
main : update log text (EOS to EOG) (#7104)
RhinoDevel May 7, 2024
53d6c52
readme : update hot topics
ggerganov May 7, 2024
260b7c6
server : update readme with undocumented options (#7013)
K-Mistele May 7, 2024
b6aa670
Fix OLMo HF to GGUF conversion (#6910)
nopperl May 7, 2024
af0a5b6
server: fix incorrectly reported token probabilities (#7125)
JohannesGaessler May 7, 2024
48b2f9c
Fixed save_imatrix to match old behaviour for MoE (#7099)
jukofyork May 8, 2024
c780e75
Further tidy on Android instructions README.md (#7077)
Jeximo May 8, 2024
c0e6fbf
metal : fix unused warning
ggerganov May 8, 2024
3855416
ggml : introduce bfloat16 support (#6412)
jart May 8, 2024
acdce3c
compare-llama-bench.py: add missing basicConfig (#7138)
mofosyne May 8, 2024
7e0b6a7
py : also print the normalizers
ggerganov May 8, 2024
4cd621c
convert : add BPE pre-tokenization for DBRX (#7132)
dranger003 May 8, 2024
1fd9c17
clean up json_value & server_log (#7142)
ngxson May 8, 2024
229ffff
llama : add BPE pre-tokenization for Qwen2 (#7114)
jklj077 May 8, 2024
ad211ed
convert.py : --vocab-only generates false but valid params (#7027)
20kdc May 8, 2024
911b390
server : add_special option for tokenize endpoint (#7059)
JohanAR May 8, 2024
465263d
sgemm : AVX Q4_0 and Q8_0 (#6891)
netrunnereve May 8, 2024
83330d8
main : add --conversation / -cnv flag (#7108)
May 8, 2024
26458af
metal : use `vm_allocate` instead of `posix_memalign` on macOS (#7078)
giladgd May 8, 2024
bd1871f
server : add themes + favicon (#6848)
jboero May 8, 2024
9da243b
Revert "llava : add support for moondream vision language model (#6899)"
ggerganov May 8, 2024
c12452c
JSON: [key] -> .at(key), assert() -> GGML_ASSERT (#7143)
JohannesGaessler May 8, 2024
bc4bba3
Introduction of CUDA Graphs to LLama.cpp (#6766)
agray3 May 8, 2024
f98eb31
convert-hf : save memory with lazy evaluation (#7075)
compilade May 8, 2024
4426e29
cmake : fix typo (#7151)
cebtenzzre May 8, 2024
ed72533
Add special token modification capability
CISC Apr 20, 2024
27caf19
improve help text
CISC Apr 20, 2024
8737ca1
flake--
CISC Apr 20, 2024
3e3e7c3
fix multiple tokens warning
CISC Apr 20, 2024
bc92f65
make script executable
CISC Apr 21, 2024
87e2d73
switch to namedtuple, no need to dataclass
CISC Apr 21, 2024
981bd44
typing++
CISC May 4, 2024
609df3c
add progress bar
CISC May 4, 2024
144d99a
Merge branch 'modify-special-tokens-metadata' of github.com:CISC/llam…
CISC May 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
llama : support Llama 3 HF conversion (#6745)
* Support Llama 3 conversion

The tokenizer is BPE.

* style

* Accept suggestion

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

* llama : add llama_token_is_eog()

ggml-ci

* llama : auto-detect more EOT tokens when missing in KV data

* convert : replacing EOS token is a hack

* llama : fix codegemma EOT token + add TODOs

* llama : fix model type string for 8B model

---------

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
  • Loading branch information
3 people authored Apr 21, 2024
commit b97bc3966e852adb626c90be64fd48282800f504
49 changes: 34 additions & 15 deletions 49 convert-hf-to-gguf.py
Original file line number Diff line number Diff line change
Expand Up @@ -1301,15 +1301,23 @@ def set_vocab(self):
try:
self. _set_vocab_sentencepiece()
except FileNotFoundError:
self._set_vocab_llama_hf()

special_vocab = gguf.SpecialVocab(self.dir_model, load_merges=False,
special_token_types = ['prefix', 'suffix', 'middle', 'eot'])
special_vocab._set_special_token("prefix", 32007)
special_vocab._set_special_token("suffix", 32008)
special_vocab._set_special_token("middle", 32009)
special_vocab._set_special_token("eot", 32010)
special_vocab.add_to_gguf(self.gguf_writer)
try:
self._set_vocab_llama_hf()
except (FileNotFoundError, TypeError):
# Llama 3
self._set_vocab_gpt2()

# Apply to CodeLlama only (and ignore for Llama 3 with a vocab size of 128256)
if self.hparams.get("vocab_size", 32000) == 32016:
special_vocab = gguf.SpecialVocab(
self.dir_model, load_merges=False,
special_token_types = ['prefix', 'suffix', 'middle', 'eot']
)
special_vocab._set_special_token("prefix", 32007)
special_vocab._set_special_token("suffix", 32008)
special_vocab._set_special_token("middle", 32009)
special_vocab._set_special_token("eot", 32010)
special_vocab.add_to_gguf(self.gguf_writer)

def set_gguf_parameters(self):
super().set_gguf_parameters()
Expand Down Expand Up @@ -2194,6 +2202,8 @@ def set_vocab(self):
old_eos = special_vocab.special_token_ids["eos"]
if "chat" in os.path.basename(self.dir_model.absolute()):
# For the chat model, we replace the eos with '<|im_end|>'.
# TODO: this is a hack, should be fixed
# https://github.com/ggerganov/llama.cpp/pull/6745#issuecomment-2067687048
special_vocab.special_token_ids["eos"] = self._try_get_sft_eos(tokenizer)
print(f"Replace eos:{old_eos} with a special token:{special_vocab.special_token_ids['eos']} \
in chat mode so that the conversation can end normally.")
Expand Down Expand Up @@ -2429,12 +2439,15 @@ class GemmaModel(Model):

def set_vocab(self):
self._set_vocab_sentencepiece()

# TODO: these special tokens should be exported only for the CodeGemma family
special_vocab = gguf.SpecialVocab(self.dir_model, load_merges=False,
special_token_types = ['prefix', 'suffix', 'middle', 'eot'])
special_token_types = ['prefix', 'suffix', 'middle', 'fsep', 'eot'])
special_vocab._set_special_token("prefix", 67)
special_vocab._set_special_token("suffix", 69)
special_vocab._set_special_token("middle", 68)
special_vocab._set_special_token("eot", 70)
special_vocab._set_special_token("fsep", 70)
special_vocab._set_special_token("eot", 107)
special_vocab.add_to_gguf(self.gguf_writer)

def set_gguf_parameters(self):
Expand Down Expand Up @@ -2523,28 +2536,34 @@ def set_vocab(self):

field = neox_reader.get_field(gguf.Keys.Tokenizer.MODEL)
self.gguf_writer.add_tokenizer_model(bytes(field.parts[-1]))

field = neox_reader.get_field(gguf.Keys.Tokenizer.LIST)
self.gguf_writer.add_token_list([bytes(field.parts[i]) for i in field.data][:vocab_size])

field = neox_reader.get_field(gguf.Keys.Tokenizer.TOKEN_TYPE)
self.gguf_writer.add_token_types([field.parts[i].tolist()[0] for i in field.data][:vocab_size])

field = neox_reader.get_field(gguf.Keys.Tokenizer.MERGES)
self.gguf_writer.add_token_merges([bytes(field.parts[i]) for i in field.data])

field = neox_reader.get_field(gguf.Keys.Tokenizer.BOS_ID)
self.gguf_writer.add_bos_token_id(field.parts[-1].tolist()[0])

field = neox_reader.get_field(gguf.Keys.Tokenizer.EOS_ID)
self.gguf_writer.add_eos_token_id(field.parts[-1].tolist()[0])

field = neox_reader.get_field(gguf.Keys.Tokenizer.UNK_ID)
self.gguf_writer.add_unk_token_id(field.parts[-1].tolist()[0])

def set_gguf_parameters(self):
d_model = self.find_hparam(["hidden_size", "d_model"])
d_conv = self.find_hparam(["conv_kernel", "d_conv"], optional=True) or 4
d_model = self.find_hparam(["hidden_size", "d_model"])
d_conv = self.find_hparam(["conv_kernel", "d_conv"], optional=True) or 4
d_inner = self.find_hparam(["intermediate_size", "d_inner"], optional=True) or 2 * d_model
d_state = self.find_hparam(["state_size", "d_state"], optional=True) or 16
d_state = self.find_hparam(["state_size", "d_state"], optional=True) or 16
# ceiling division
# ref: https://stackoverflow.com/a/17511341/22827863
# ref: https://github.com/state-spaces/mamba/blob/ce59daea3a090d011d6476c6e5b97f6d58ddad8b/mamba_ssm/modules/mamba_simple.py#L58
dt_rank = self.find_hparam(["time_step_rank", "dt_rank"], optional=True) or -(d_model // -16)
dt_rank = self.find_hparam(["time_step_rank", "dt_rank"], optional=True) or -(d_model // -16)
rms_norm_eps = self.find_hparam(["layer_norm_epsilon", "rms_norm_eps"], optional=True) or 1e-5

# Fail early for models which don't have a block expansion factor of 2
Expand Down
9 changes: 8 additions & 1 deletion 9 convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -525,7 +525,14 @@ def __init__(self, base_path: Path):

# pre-check so we know if we need transformers
tokenizer_model: dict[str, Any] = tokenizer_json['model']
if (
is_llama3 = (
tokenizer_model['type'] == 'BPE' and tokenizer_model.get('ignore_merges', False)
and not tokenizer_model.get('byte_fallback', True)
)
if is_llama3:
raise TypeError('Llama 3 must be converted with BpeVocab')

if not is_llama3 and (
tokenizer_model['type'] != 'BPE' or not tokenizer_model.get('byte_fallback', False)
or tokenizer_json['decoder']['type'] != 'Sequence'
):
Expand Down
2 changes: 1 addition & 1 deletion 2 examples/batched.swift/Sources/main.swift
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ while n_cur <= n_len {
// const llama_token new_token_id = llama_sample_token_greedy(ctx, &candidates_p);

// is it an end of stream? -> mark the stream as finished
if new_token_id == llama_token_eos(model) || n_cur == n_len {
if llama_token_is_eog(model, new_token_id) || n_cur == n_len {
i_batch[i] = -1
// print("")
if n_parallel > 1 {
Expand Down
4 changes: 2 additions & 2 deletions 4 examples/batched/batched.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -191,8 +191,8 @@ int main(int argc, char ** argv) {

//const llama_token new_token_id = llama_sample_token_greedy(ctx, &candidates_p);

// is it an end of stream? -> mark the stream as finished
if (new_token_id == llama_token_eos(model) || n_cur == n_len) {
// is it an end of generation? -> mark the stream as finished
if (llama_token_is_eog(model, new_token_id) || n_cur == n_len) {
i_batch[i] = -1;
LOG_TEE("\n");
if (n_parallel > 1) {
Expand Down
2 changes: 1 addition & 1 deletion 2 examples/beam-search/beam-search.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ struct beam_search_callback_data {
// In this case, end-of-beam (eob) is equivalent to end-of-sentence (eos) but this need not always be the same.
// For example, eob can be flagged due to maximum token length, stop words, etc.
static bool is_at_eob(const beam_search_callback_data & callback_data, const llama_token * tokens, size_t n_tokens) {
return n_tokens && tokens[n_tokens-1] == llama_token_eos(llama_get_model(callback_data.ctx));
return n_tokens && llama_token_is_eog(llama_get_model(callback_data.ctx), tokens[n_tokens-1]);
}

// Function matching type llama_beam_search_callback_fn_t.
Expand Down
10 changes: 5 additions & 5 deletions 10 examples/infill/infill.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -586,7 +586,7 @@ int main(int argc, char ** argv) {

// deal with eot token in infill mode
if ((llama_sampling_last(ctx_sampling) == llama_token_eot(model) || is_interacting) && params.interactive){
if(is_interacting && !params.interactive_first) {
if (is_interacting && !params.interactive_first) {
// print an eot token
printf("%s", llama_token_to_piece(ctx, llama_token_eot(model)).c_str());
}
Expand Down Expand Up @@ -651,8 +651,8 @@ int main(int argc, char ** argv) {
// LOG_TEE("took new input\n");
is_interacting = false;
}
// deal with end of text token in interactive mode
else if (llama_sampling_last(ctx_sampling) == llama_token_eos(model)) {
// deal with end of generation tokens in interactive mode
else if (llama_token_is_eog(model, llama_sampling_last(ctx_sampling))) {
LOG("found EOS token\n");

if (params.interactive) {
Expand Down Expand Up @@ -731,8 +731,8 @@ int main(int argc, char ** argv) {
}
}

// end of text token
if (!embd.empty() && embd.back() == llama_token_eos(model) && !params.interactive) {
// end of generation
if (!embd.empty() && llama_token_is_eog(model, embd.back()) && !params.interactive) {
break;
}

Expand Down
2 changes: 1 addition & 1 deletion 2 examples/llama.android/app/src/main/cpp/llama-android.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -408,7 +408,7 @@ Java_com_example_llama_Llm_completion_1loop(
const auto new_token_id = llama_sample_token_greedy(context, &candidates_p);

const auto n_cur = env->CallIntMethod(intvar_ncur, la_int_var_value);
if (new_token_id == llama_token_eos(model) || n_cur == n_len) {
if (llama_token_is_eog(model, new_token_id) || n_cur == n_len) {
return env->NewStringUTF("");
}

Expand Down
2 changes: 1 addition & 1 deletion 2 examples/llama.swiftui/llama.cpp.swift/LibLlama.swift
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ actor LlamaContext {
new_token_id = llama_sample_token_greedy(context, &candidates_p)
}

if new_token_id == llama_token_eos(model) || n_cur == n_len {
if llama_token_is_eog(model, new_token_id) || n_cur == n_len {
print("\n")
let new_token_str = String(cString: temporary_invalid_cchars + [0])
temporary_invalid_cchars.removeAll()
Expand Down
2 changes: 1 addition & 1 deletion 2 examples/llava/llava-cli.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ static const char * sample(struct llama_sampling_context * ctx_sampling,
const llama_token id = llama_sampling_sample(ctx_sampling, ctx_llama, NULL);
llama_sampling_accept(ctx_sampling, ctx_llama, id, true);
static std::string ret;
if (id == llama_token_eos(llama_get_model(ctx_llama))) {
if (llama_token_is_eog(llama_get_model(ctx_llama), id)) {
ret = "</s>";
} else {
ret = llama_token_to_piece(ctx_llama, id);
Expand Down
2 changes: 1 addition & 1 deletion 2 examples/lookahead/lookahead.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -299,7 +299,7 @@ int main(int argc, char ** argv) {
}
fflush(stdout);

if (id == llama_token_eos(model)) {
if (llama_token_is_eog(model, id)) {
has_eos = true;
}

Expand Down
2 changes: 1 addition & 1 deletion 2 examples/lookup/lookup.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ int main(int argc, char ** argv){
printf("%s", token_str.c_str());
}

if (id == llama_token_eos(model)) {
if (llama_token_is_eog(model, id)) {
has_eos = true;
}

Expand Down
8 changes: 4 additions & 4 deletions 8 examples/main/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -795,8 +795,8 @@ int main(int argc, char ** argv) {
}
}

// deal with end of text token in interactive mode
if (llama_sampling_last(ctx_sampling) == llama_token_eos(model)) {
// deal with end of generation tokens in interactive mode
if (llama_token_is_eog(model, llama_sampling_last(ctx_sampling))) {
LOG("found EOS token\n");

if (params.interactive) {
Expand Down Expand Up @@ -920,8 +920,8 @@ int main(int argc, char ** argv) {
}
}

// end of text token
if (!embd.empty() && embd.back() == llama_token_eos(model) && !(params.instruct || params.interactive || params.chatml)) {
// end of generation
if (!embd.empty() && llama_token_is_eog(model, embd.back()) && !(params.instruct || params.interactive || params.chatml)) {
LOG_TEE(" [end of text]\n");
break;
}
Expand Down
2 changes: 1 addition & 1 deletion 2 examples/parallel/parallel.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -359,7 +359,7 @@ int main(int argc, char ** argv) {
// client.id, client.seq_id, id, client.n_decoded, client.i_batch, token_str.c_str());

if (client.n_decoded > 2 &&
(id == llama_token_eos(model) ||
(llama_token_is_eog(model, id) ||
(params.n_predict > 0 && client.n_decoded + client.n_prompt >= params.n_predict) ||
client.response.find("User:") != std::string::npos ||
client.response.find('\n') != std::string::npos)) {
Expand Down
4 changes: 2 additions & 2 deletions 4 examples/passkey/passkey.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -252,8 +252,8 @@ int main(int argc, char ** argv) {
// sample the most likely token
const llama_token new_token_id = llama_sample_token_greedy(ctx, &candidates_p);

// is it an end of stream?
if (new_token_id == llama_token_eos(model) || n_cur == n_len) {
// is it an end of generation?
if (llama_token_is_eog(model, new_token_id) || n_cur == n_len) {
LOG_TEE("\n");

break;
Expand Down
2 changes: 1 addition & 1 deletion 2 examples/server/server.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1201,7 +1201,7 @@ struct server_context {
});
}

if (result.tok == llama_token_eos(model)) {
if (llama_token_is_eog(model, result.tok)) {
slot.stopped_eos = true;
slot.has_next_token = false;

Expand Down
4 changes: 0 additions & 4 deletions 4 examples/server/utils.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -381,10 +381,6 @@ static json oaicompat_completion_params_parse(
} else {
llama_params["stop"] = json_value(body, "stop", json::array());
}
// Some chat templates don't use EOS token to stop generation
// We must add their end sequences to list of stop words
llama_params["stop"].push_back("<|im_end|>"); // chatml
llama_params["stop"].push_back("<end_of_turn>"); // gemma

// Handle "response_format" field
if (body.contains("response_format")) {
Expand Down
4 changes: 2 additions & 2 deletions 4 examples/simple/simple.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -133,8 +133,8 @@ int main(int argc, char ** argv) {
// sample the most likely token
const llama_token new_token_id = llama_sample_token_greedy(ctx, &candidates_p);

// is it an end of stream?
if (new_token_id == llama_token_eos(model) || n_cur == n_len) {
// is it an end of generation?
if (llama_token_is_eog(model, new_token_id) || n_cur == n_len) {
LOG_TEE("\n");

break;
Expand Down
2 changes: 1 addition & 1 deletion 2 examples/speculative/speculative.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -360,7 +360,7 @@ int main(int argc, char ** argv) {
}
}

if (token_id == llama_token_eos(model_tgt)) {
if (llama_token_is_eog(model_tgt, token_id)) {
has_eos = true;
}
++n_predict;
Expand Down
Loading
Morty Proxy This is a proxified and sanitized view of the page, visit original site.