Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

inforithmics
Copy link
Contributor

@inforithmics inforithmics commented Aug 9, 2025

A pull request of #9650 with newest patches of main

Version 12.5

OllamaSetup.zip

Build with build_windows.ps1:

Some interesting Links:
Vulkan vs ROCm on Linux:
https://www.phoronix.com/review/llama-cpp-windows-linux/5
https://www.phoronix.com/review/amd-rocm-7-strix-halo/3

@inforithmics
Copy link
Contributor Author

inforithmics commented Oct 7, 2025

I included this commit as a patch because it could cause issues in Flash Attention which is now enabled by default for certain Models ggml-org/llama.cpp#16365

@dhiltgen
Copy link
Collaborator

dhiltgen commented Oct 7, 2025

It looks like PCI IDs aren't getting set up properly on Vulkan on Windows. I'll take a look at the code and see if I can spot why, but it seems consistent across GPU vendors on Windows, which is leading to duplicate devices showing up. In my branch I'm pretty sure I had it working, so it's hopefully just a simple glitch somewhere.

Update: Looks like I had a stale build on the windows test systems - refreshed to your latest commit and the PCI IDs are showing up correctly.

@virajwad
Copy link

virajwad commented Oct 7, 2025

Hi, in current state of PR I'm also seeing a functional regression where no layers of the model is being put on GPU and there are also ggml_uncaught_exceptions showing up in the log - tested on Meteor Lake IGPU

ollama_vulkan_11835_MTL_IGPU_exception.txt

tested current state on this commit

EDIT Sorry, I forgot to rebuild after pull. Everything working fine on my side

@inforithmics

This comment was marked as resolved.

@inforithmics

This comment was marked as resolved.

@inforithmics

This comment was marked as resolved.

@inforithmics

This comment was marked as resolved.

@inforithmics

This comment was marked as resolved.

@inforithmics

This comment was marked as resolved.

@dhiltgen
Copy link
Collaborator

@inforithmics the GGML update is now merged.

@dhiltgen
Copy link
Collaborator

@inforithmics I've tested your latest commit and things are looking good. I believe once you rebase on main with the latest GGML update, you should be able to drop the extra vulkan patch you had to carry. What we'd like to do is merge this soon and bring in Vulkan support in 2 phases. First would be local build support. After we merge your PR, I'll follow up with some minor CI changes so we can disable Vulkan in the official binary release temporarily, and make sure it still builds by default for anyone who checks out and builds locally. Then we can continue to test it and work through any remaining major issues as follow up commits on main in smaller PRs. Once things are looking solid, then we'll undo those CI changes so it gets built in the official releases for Linux and Windows. Thanks for sticking in there!

As far as follow ups I'm tracking after this merges: my test systems include a selection of AMD iGPUs and Intel integrated and discrete GPUs which use Vulkan, and there are some library models that hit GGML asserts in the Vulkan backend. Additionally the scheduler will need some adjusting to be iGPU aware instead of naively favoring the GPU with the most VRAM available.

@inforithmics
Copy link
Contributor Author

inforithmics commented Oct 14, 2025

@dhiltgen

  1. I have now merged with main
  2. synchronized vulkan and new *.glsl files
  3. Added Commit only use for vulkan the filteredid (numeric device number)
    Which uses the FilteredID only for Vulkan (otherwise would cause behavioral changes on ROCm and CUDA). FilteredID is a numeric ID of the Position as Vulkan Device. It is calculated by the position of the returned devices. Maybe this should be a parameter that is filled in the Vulkan backend.
  4. Removed unnecessary patch.

@dhiltgen dhiltgen merged commit 2aba569 into ollama:main Oct 14, 2025
8 checks passed
@ollama ollama deleted a comment from ericcurtin Oct 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Morty Proxy This is a proxified and sanitized view of the page, visit original site.