Closed
Description
Inference with CLBlast fails with a segfault after the commit that merged #4256 on context sizes above 2k when all GPU layers are offloaded.
Command line:
C:\test\llama-b1601-bin-win-clblast-x64>main.exe -m E:\LLaMA\models\airoboros-mistral2.2-7b.Q4_K_S.gguf -c 4096 -b 512 -n 32 -ngl 33 -f C:\test\test.txt
main: build = 1601 (5a7d312)
main: built with MSVC 19.37.32826.1 for x64
main: seed = 1701534899
ggml_opencl: selecting platform: 'NVIDIA CUDA'
ggml_opencl: selecting device: 'NVIDIA GeForce RTX 2060'
ggml_opencl: device FP16 support: false
Result:
Prompt processing starts, and then segfaults halfway around the 2k token mark, before generation begins. Only if the prompt is short enough (less than 2k tokens) it appears to work.
AlpinDale
Metadata
Metadata
Assignees
Labels
Something isn't workingSomething isn't working