Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

sycl: cleanup oneDNN related code #12097

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 21, 2025

Conversation

sgeor255
Copy link
Contributor

@sgeor255 sgeor255 commented Feb 27, 2025

This PR cleans up and improves some oneDNN-related code:

  • Use user-scratchpad mode when creating matmul primitives to avoid allocations during execution
  • Clean up the cmake configuration for finding and linking oneDNN

Marking it as a draft PR as it needs uxlfoundation/oneDNN#2768 to be merged in oneDNN to fix a bug with missing dependencies in oneDNN.

@github-actions github-actions bot added documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Feb 27, 2025
Copy link
Collaborator

@Rbiessy Rbiessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Can you paste here some performance numbers using the Nvidia backend with and without oneDNN? Since we are suggesting to enable oneDNN for Nvidia the performance should be at least as good.

ggml/src/ggml-sycl/common.hpp Outdated Show resolved Hide resolved
@sgeor255 sgeor255 force-pushed the svet/llama-onednn branch 2 times, most recently from af9f64c to 3692b1a Compare February 28, 2025 10:54
@NeoZhangJianyu
Copy link
Collaborator

llama.cpp use the official release of oneAPI (including oneDNN).
Even if the PR of oneDNN is merged, the oneAPI will include it after a long time.

So, this draft would be pending for a long time I guess.

@sgeor255
Copy link
Contributor Author

llama.cpp use the official release of oneAPI (including oneDNN). Even if the PR of oneDNN is merged, the oneAPI will include it after a long time.

So, this draft would be pending for a long time I guess.

@NeoZhangJianyu The official release of oneDNN included in the oneapi release doesn't include nvidia support, and therefore it needs to be compiled from source. The oneDNN PR addresses an issue related to nvidia builds. Additionally, the changes don't affect intel devices as an 'official' release of oneDNN can still be used.

auto a_mem = dnnl::memory(a_in_md, eng, const_cast<void*>(a));
auto b_mem = dnnl::memory(b_in_md, eng, const_cast<void*>(b));
auto matmul_pd = dnnl::matmul::primitive_desc(eng, a_in_md, b_in_md, c_md);
auto a_mem = dnnl::memory(a_in_md, eng, (void *) a);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove const_cast here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was an oversight, I've reverted it, thanks for flagging.

auto b_mem = dnnl::memory(b_in_md, eng, const_cast<void*>(b));
auto matmul_pd = dnnl::matmul::primitive_desc(eng, a_in_md, b_in_md, c_md);
auto a_mem = dnnl::memory(a_in_md, eng, (void *) a);
auto b_mem = dnnl::memory(b_in_md, eng, (void *) b);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@sgeor255
Copy link
Contributor Author

LGTM overall. Can you paste here some performance numbers using the Nvidia backend with and without oneDNN? Since we are suggesting to enable oneDNN for Nvidia the performance should be at least as good.

@Rbiessy numbers below:

  • without oneDNN:
model size params backend ngl threads sm test t/s
llama 8B Q8_0 7.95 GiB 8.03 B SYCL 99 8 none pp512 5699.05 ± 55.43
llama 8B Q8_0 7.95 GiB 8.03 B SYCL 99 8 none tg128 74.74 ± 0.12

build: 3692b1a (4518)

model size params backend ngl threads sm test t/s
llama 8B Q4_K - Medium 4.58 GiB 8.03 B SYCL 99 8 none pp512 5515.19 ± 63.72
llama 8B Q4_K - Medium 4.58 GiB 8.03 B SYCL 99 8 none tg128 86.25 ± 0.27

build: 3692b1a (4518)

model size params backend ngl threads sm test t/s
llama 70B Q4_K - Small 37.57 GiB 70.55 B SYCL 99 8 none pp512 714.66 ± 2.78
llama 70B Q4_K - Small 37.57 GiB 70.55 B SYCL 99 8 none tg128 14.57 ± 0.07

build: 3692b1a (4518)

  • with oneDNN:
model size params backend ngl threads sm test t/s
llama 8B Q8_0 7.95 GiB 8.03 B SYCL 99 8 none pp512 5787.95 ± 52.08
llama 8B Q8_0 7.95 GiB 8.03 B SYCL 99 8 none tg128 74.46 ± 0.23

build: 3692b1a (4518)

model size params backend ngl threads sm test t/s
llama 8B Q4_K - Medium 4.58 GiB 8.03 B SYCL 99 8 none pp512 5604.82 ± 83.39
llama 8B Q4_K - Medium 4.58 GiB 8.03 B SYCL 99 8 none tg128 85.53 ± 0.18

build: 3692b1a (4518)

model size params backend ngl threads sm test t/s
llama 70B Q4_K - Small 37.57 GiB 70.55 B SYCL 99 8 none pp512 711.86 ± 3.66
llama 70B Q4_K - Small 37.57 GiB 70.55 B SYCL 99 8 none tg128 14.62 ± 0.09

build: 3692b1a (4518)

@sgeor255 sgeor255 force-pushed the svet/llama-onednn branch 2 times, most recently from 11bc77b to ec9f879 Compare February 28, 2025 13:41
Copy link
Collaborator

@qnixsynapse qnixsynapse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! Overall LGTM...

Copy link
Collaborator

@Alcpz Alcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM as well, thanks for adding the performance numbers.

@sgeor255 sgeor255 force-pushed the svet/llama-onednn branch 3 times, most recently from a846423 to 5a3c158 Compare March 3, 2025 16:23
@sgeor255 sgeor255 force-pushed the svet/llama-onednn branch from 5a3c158 to 5bb51a8 Compare March 3, 2025 17:21
@NeoZhangJianyu
Copy link
Collaborator

llama.cpp use the official release of oneAPI (including oneDNN). Even if the PR of oneDNN is merged, the oneAPI will include it after a long time.
So, this draft would be pending for a long time I guess.

@NeoZhangJianyu The official release of oneDNN included in the oneapi release doesn't include nvidia support, and therefore it needs to be compiled from source. The oneDNN PR addresses an issue related to nvidia builds. Additionally, the changes don't affect intel devices as an 'official' release of oneDNN can still be used.

OK!

@sgeor255 sgeor255 marked this pull request as ready for review March 20, 2025 15:49
@sgeor255
Copy link
Contributor Author

@Rbiessy @Alcpz @qnixsynapse @NeoZhangJianyu the oneDNN PR has been merged, so this PR is no longer a draft. Let me know if you have any more comments! :)

Copy link
Collaborator

@NeoZhangJianyu NeoZhangJianyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested with Intel Arc 770.
No impact of performance.

@NeoZhangJianyu NeoZhangJianyu merged commit 9ffcc9e into ggml-org:master Mar 21, 2025
47 checks passed
Ivy233 pushed a commit to Ivy233/llama.cpp that referenced this pull request Mar 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.