Closed
Description
Hello!
I have been trying to test the new kv cache loading and ran into an issue, it seems to segfault when running llama_eval
.
To save the current cache i do:
import llama_cpp
import pickle
from ctypes import cast
# Some work...
kv_tokens = llama_cpp.llama_get_kv_cache_token_count(ctx)
kv_len = llama_cpp.llama_get_kv_cache_size(ctx)
kv_cache = llama_cpp.llama_get_kv_cache(ctx)
kv_cache = cast(kv_cache, llama_cpp.POINTER(llama_cpp.c_uint8 * kv_len))
kv_cache = bytearray(kv_cache)
with open("test.bin", "wb") as f:
pickle.dump([kv_cache,kv_tokens], f)
Loading:
with open("test.bin", "rb") as f:
kv_cache, kv_tokens = pickle.load(f)
llama_cpp.llama_set_kv_cache(ctx,
(llama_cpp.c_uint8 * len(kv_cache)).from_buffer(kv_cache),
len(kv_cache),
kv_tokens
)
But running llama_cpp.llama_eval
after will result in a segfault.
llama-cpp-python version: 0.1.16
How do i fix this?
Thanks
Metadata
Metadata
Assignees
Labels
No labels