Fix incorrect token_logprobs (due to indexing after sorting) #453

wu-qing-157 · Jul 7, 2023

Thanks for the wonderful Python wrapper for llama.cpp.
When using your framework, I found a problem in output['logprobs']['token_logprobs].

For example, on commit ca11673 (latest main branch when I write this) the following code

from llama_cpp import Llama
llama = Llama('.../llama.cpp/models/30B/ggml-model-q8_0.bin', logits_all=True)
print(llama('I', temperature=0, max_tokens=5, logprobs=1))

will output {'id': 'cmpl-caa5cef2-ae2f-4157-8323-2dc644bb6308', 'object': 'text_completion', 'created': 1688724957, 'model': '/data/yi/llama.cpp/models/30B/ggml-model-q8_0.bin', 'choices': [{'text': "'m a big fan", 'index': 0, 'logprobs': {'tokens': ["'", 'm', ' a', ' big', ' fan'], 'text_offset': [1, 2, 3, 5, 9], 'token_logprobs': [-19.985351200841283, -19.405950866756548, -7.967424092851152, -12.165255948926252, -17.411318060240337], 'top_logprobs': [{"'": -2.2789093215688228}, {'m': -0.5416520462609419}, {' a': -2.119851766190996}, {' big': -2.872671052838605}, {' fan': -0.29880118020493784}]}, 'finish_reason': 'length'}], 'usage': {'prompt_tokens': 2, 'completion_tokens': 5, 'total_tokens': 7}}. You can see the token_logprobs are quite small, and are different from the numbers in top_logprobs, which are assumed to be consistent.

When I look into the code, I find that this unexpected behavior is caused by indexing the sorted logprobs. In L1134 of

llama-cpp-python/llama_cpp/llama.py

Lines 1123 to 1140 in ca11673

    
           for token, token_str, logprobs_token in zip( 
        
               all_tokens, all_token_strs, all_logprobs 
        
           ): 
        
               text_offsets.append(text_offset) 
        
               text_offset += len(token_str) 
        
               tokens.append(token_str) 
        
               sorted_logprobs = list( 
        
                   sorted( 
        
                       zip(logprobs_token, range(len(logprobs_token))), reverse=True 
        
                   ) 
        
               ) 
        
               token_logprobs.append(sorted_logprobs[int(token)][0]) 
        
               top_logprob: Optional[Dict[str, float]] = { 
        
                   self.detokenize([i]).decode("utf-8", errors="ignore"): logprob 
        
                   for logprob, i in sorted_logprobs[:logprobs] 
        
               } 
        
               top_logprob.update({token_str: logprobs_token[int(token)]}) 
        
               top_logprobs.append(top_logprob)

since sorted_logprobs is already sorted, indexing the list by token id does not produce the corresponding logprobs. We should instead index the list before sorting, like the one in L1139.
Same fix for L961 and L1036.

After the fix, on commit 9e61661, the above code outputs {'id': 'cmpl-74ba65a7-1978-4dcd-aa43-09b35ec8361c', 'object': 'text_completion', 'created': 1688725240, 'model': '/data/yi/llama.cpp/models/30B/ggml-model-q8_0.bin', 'choices': [{'text': "'m a big fan", 'index': 0, 'logprobs': {'tokens': ["'", 'm', ' a', ' big', ' fan'], 'text_offset': [1, 2, 3, 5, 9], 'token_logprobs': [-2.2789093215688228, -0.5416520462609419, -2.119851766190996, -2.872671052838605, -0.29880118020493784], 'top_logprobs': [{"'": -2.2789093215688228}, {'m': -0.5416520462609419}, {' a': -2.119851766190996}, {' big': -2.872671052838605}, {' fan': -0.29880118020493784}]}, 'finish_reason': 'length'}], 'usage': {'prompt_tokens': 2, 'completion_tokens': 5, 'total_tokens': 7}}, where the token_logprobs matches top_logprobs.

I think #349 is also due to this bug.

abetlen · Jul 8, 2023

@wu-qing-157 thanks for the catch! I'd just been testing using openplayground to visualize the logprobs, however that just uses the top_logprobs. LGTM

…len#453) * Support calling mlock() on loaded model data on Linux and macOS This is enabled by a new --mlock command line option. Using mlock() disables swapping and memory compression for the model data. Doing so can be useful on systems where the model takes up a large fraction of system RAM. In my experience, macOS is quite eager to start compressing llama.cpp's memory, which then makes it halt for a few seconds while it decompresses, even with a model that uses "only" 25GB out of 32GB. Of course, this comes at the cost of forcing the system to swap or compress other processes' memory instead, so it needs to be used with care and shouldn't be enabled by default. In theory it should be possible to support this on Windows as well using VirtualLock(), but I'm not much of a Windows user. * Update llama.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

fix indexing token_logprobs after sorting

9e61661

matthoffner mentioned this pull request Jul 7, 2023

Add GGML model EleutherAI/lm-evaluation-harness#617

Merged

abetlen approved these changes Jul 8, 2023

View reviewed changes

abetlen merged commit b8e0bed into abetlen:main Jul 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix incorrect token_logprobs (due to indexing after sorting) #453

Fix incorrect token_logprobs (due to indexing after sorting) #453

Uh oh!

wu-qing-157 commented Jul 7, 2023

Uh oh!

abetlen commented Jul 8, 2023 •

edited

Loading

Uh oh!

Uh oh!

	for token, token_str, logprobs_token in zip(
	all_tokens, all_token_strs, all_logprobs
	):
	text_offsets.append(text_offset)
	text_offset += len(token_str)
	tokens.append(token_str)
	sorted_logprobs = list(
	sorted(
	zip(logprobs_token, range(len(logprobs_token))), reverse=True
	)
	)
	token_logprobs.append(sorted_logprobs[int(token)][0])
	top_logprob: Optional[Dict[str, float]] = {
	self.detokenize([i]).decode("utf-8", errors="ignore"): logprob
	for logprob, i in sorted_logprobs[:logprobs]
	}
	top_logprob.update({token_str: logprobs_token[int(token)]})
	top_logprobs.append(top_logprob)

Search code, repositories, users, issues, pull requests...

Fix incorrect token_logprobs (due to indexing after sorting) #453

Fix incorrect token_logprobs (due to indexing after sorting) #453

Uh oh!

Conversation

wu-qing-157 commented Jul 7, 2023

Uh oh!

abetlen commented Jul 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

abetlen commented Jul 8, 2023 •

edited

Loading