-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Vulkan based on #9650 #11835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vulkan based on #9650 #11835
Conversation
# Conflicts: # gpu/gpu.go
# Conflicts: # gpu/gpu_linux.go
Making amd gpu work on arm achitecture with vulkan
Fix variable name
I included this commit as a patch because it could cause issues in Flash Attention which is now enabled by default for certain Models ggml-org/llama.cpp#16365 |
Update: Looks like I had a stale build on the windows test systems - refreshed to your latest commit and the PCI IDs are showing up correctly. |
EDIT Sorry, I forgot to rebuild after pull. Everything working fine on my side |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
@inforithmics the GGML update is now merged. |
@inforithmics I've tested your latest commit and things are looking good. I believe once you rebase on main with the latest GGML update, you should be able to drop the extra vulkan patch you had to carry. What we'd like to do is merge this soon and bring in Vulkan support in 2 phases. First would be local build support. After we merge your PR, I'll follow up with some minor CI changes so we can disable Vulkan in the official binary release temporarily, and make sure it still builds by default for anyone who checks out and builds locally. Then we can continue to test it and work through any remaining major issues as follow up commits on main in smaller PRs. Once things are looking solid, then we'll undo those CI changes so it gets built in the official releases for Linux and Windows. Thanks for sticking in there! As far as follow ups I'm tracking after this merges: my test systems include a selection of AMD iGPUs and Intel integrated and discrete GPUs which use Vulkan, and there are some library models that hit GGML asserts in the Vulkan backend. Additionally the scheduler will need some adjusting to be iGPU aware instead of naively favoring the GPU with the most VRAM available. |
|
A pull request of #9650 with newest patches of main
Known issues:
Without this OLLAMA_KV_CACHE_TYPE=f16 has to be set or else the llama runner crashes.
(On llama.cpp sometimes vulkan and rocm are in the same Performance range for Nvidia I cannot say).
Filter out Vulkan Devices where the Same Id already exists in ROCM or CUDA.
Vulkan iGPU device selection overhaul and PCI ID API support ggml-org/llama.cpp#15947
The vulkan Builds ran sucessfully here Vulkan Builds on CI inforithmics/ollama#7
For example GGML_VK_VISIBLE_DEVICES
Version 12.5
OllamaSetup.zip
Build with build_windows.ps1:
Some interesting Links:
Vulkan vs ROCm on Linux:
https://www.phoronix.com/review/llama-cpp-windows-linux/5
https://www.phoronix.com/review/amd-rocm-7-strix-halo/3