Fix incorrect token_logprobs (due to indexing after sorting) #453
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Thanks for the wonderful Python wrapper for llama.cpp.
When using your framework, I found a problem in
output['logprobs']['token_logprobs]
.For example, on commit ca11673 (latest main branch when I write this) the following code
will output
{'id': 'cmpl-caa5cef2-ae2f-4157-8323-2dc644bb6308', 'object': 'text_completion', 'created': 1688724957, 'model': '/data/yi/llama.cpp/models/30B/ggml-model-q8_0.bin', 'choices': [{'text': "'m a big fan", 'index': 0, 'logprobs': {'tokens': ["'", 'm', ' a', ' big', ' fan'], 'text_offset': [1, 2, 3, 5, 9], 'token_logprobs': [-19.985351200841283, -19.405950866756548, -7.967424092851152, -12.165255948926252, -17.411318060240337], 'top_logprobs': [{"'": -2.2789093215688228}, {'m': -0.5416520462609419}, {' a': -2.119851766190996}, {' big': -2.872671052838605}, {' fan': -0.29880118020493784}]}, 'finish_reason': 'length'}], 'usage': {'prompt_tokens': 2, 'completion_tokens': 5, 'total_tokens': 7}}
. You can see thetoken_logprobs
are quite small, and are different from the numbers intop_logprobs
, which are assumed to be consistent.When I look into the code, I find that this unexpected behavior is caused by indexing the sorted logprobs. In L1134 of
llama-cpp-python/llama_cpp/llama.py
Lines 1123 to 1140 in ca11673
sorted_logprobs
is already sorted, indexing the list by token id does not produce the corresponding logprobs. We should instead index the list before sorting, like the one in L1139.Same fix for L961 and L1036.
After the fix, on commit 9e61661, the above code outputs
{'id': 'cmpl-74ba65a7-1978-4dcd-aa43-09b35ec8361c', 'object': 'text_completion', 'created': 1688725240, 'model': '/data/yi/llama.cpp/models/30B/ggml-model-q8_0.bin', 'choices': [{'text': "'m a big fan", 'index': 0, 'logprobs': {'tokens': ["'", 'm', ' a', ' big', ' fan'], 'text_offset': [1, 2, 3, 5, 9], 'token_logprobs': [-2.2789093215688228, -0.5416520462609419, -2.119851766190996, -2.872671052838605, -0.29880118020493784], 'top_logprobs': [{"'": -2.2789093215688228}, {'m': -0.5416520462609419}, {' a': -2.119851766190996}, {' big': -2.872671052838605}, {' fan': -0.29880118020493784}]}, 'finish_reason': 'length'}], 'usage': {'prompt_tokens': 2, 'completion_tokens': 5, 'total_tokens': 7}}
, where thetoken_logprobs
matchestop_logprobs
.I think #349 is also due to this bug.