sycl: cleanup oneDNN related code #12097

sgeor255 · Feb 27, 2025

This PR cleans up and improves some oneDNN-related code:

Use user-scratchpad mode when creating matmul primitives to avoid allocations during execution
Clean up the cmake configuration for finding and linking oneDNN

~~Marking it as a draft PR as it needs uxlfoundation/oneDNN#2768 to be merged in oneDNN to fix a bug with missing dependencies in oneDNN.~~

Rbiessy

LGTM overall. Can you paste here some performance numbers using the Nvidia backend with and without oneDNN? Since we are suggesting to enable oneDNN for Nvidia the performance should be at least as good.

ggml/src/ggml-sycl/common.hpp

NeoZhangJianyu · Feb 28, 2025

llama.cpp use the official release of oneAPI (including oneDNN).
Even if the PR of oneDNN is merged, the oneAPI will include it after a long time.

So, this draft would be pending for a long time I guess.

sgeor255 · Feb 28, 2025

llama.cpp use the official release of oneAPI (including oneDNN). Even if the PR of oneDNN is merged, the oneAPI will include it after a long time.

So, this draft would be pending for a long time I guess.

@NeoZhangJianyu The official release of oneDNN included in the oneapi release doesn't include nvidia support, and therefore it needs to be compiled from source. The oneDNN PR addresses an issue related to nvidia builds. Additionally, the changes don't affect intel devices as an 'official' release of oneDNN can still be used.

qnixsynapse · Feb 28, 2025

ggml/src/ggml-sycl/gemm.hpp

-        auto a_mem = dnnl::memory(a_in_md, eng, const_cast<void*>(a));
-        auto b_mem = dnnl::memory(b_in_md, eng, const_cast<void*>(b));
-        auto matmul_pd = dnnl::matmul::primitive_desc(eng, a_in_md, b_in_md, c_md);
+        auto a_mem = dnnl::memory(a_in_md, eng, (void *) a);


Why remove const_cast here?

It was an oversight, I've reverted it, thanks for flagging.

qnixsynapse · Feb 28, 2025

ggml/src/ggml-sycl/gemm.hpp

-        auto b_mem = dnnl::memory(b_in_md, eng, const_cast<void*>(b));
-        auto matmul_pd = dnnl::matmul::primitive_desc(eng, a_in_md, b_in_md, c_md);
+        auto a_mem = dnnl::memory(a_in_md, eng, (void *) a);
+        auto b_mem = dnnl::memory(b_in_md, eng, (void *) b);


Same question here too.

Same as above.

sgeor255 · Feb 28, 2025

LGTM overall. Can you paste here some performance numbers using the Nvidia backend with and without oneDNN? Since we are suggesting to enable oneDNN for Nvidia the performance should be at least as good.

@Rbiessy numbers below:

without oneDNN:

model	size	params	backend	ngl	threads	sm	test	t/s
llama 8B Q8_0	7.95 GiB	8.03 B	SYCL	99	8	none	pp512	5699.05 ± 55.43
llama 8B Q8_0	7.95 GiB	8.03 B	SYCL	99	8	none	tg128	74.74 ± 0.12

build: 3692b1a (4518)

model	size	params	backend	ngl	threads	sm	test	t/s
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	8	none	pp512	5515.19 ± 63.72
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	8	none	tg128	86.25 ± 0.27

build: 3692b1a (4518)

model	size	params	backend	ngl	threads	sm	test	t/s
llama 70B Q4_K - Small	37.57 GiB	70.55 B	SYCL	99	8	none	pp512	714.66 ± 2.78
llama 70B Q4_K - Small	37.57 GiB	70.55 B	SYCL	99	8	none	tg128	14.57 ± 0.07

build: 3692b1a (4518)

with oneDNN:

model	size	params	backend	ngl	threads	sm	test	t/s
llama 8B Q8_0	7.95 GiB	8.03 B	SYCL	99	8	none	pp512	5787.95 ± 52.08
llama 8B Q8_0	7.95 GiB	8.03 B	SYCL	99	8	none	tg128	74.46 ± 0.23

build: 3692b1a (4518)

model	size	params	backend	ngl	threads	sm	test	t/s
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	8	none	pp512	5604.82 ± 83.39
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	8	none	tg128	85.53 ± 0.18

build: 3692b1a (4518)

model	size	params	backend	ngl	threads	sm	test	t/s
llama 70B Q4_K - Small	37.57 GiB	70.55 B	SYCL	99	8	none	pp512	711.86 ± 3.66
llama 70B Q4_K - Small	37.57 GiB	70.55 B	SYCL	99	8	none	tg128	14.62 ± 0.09

build: 3692b1a (4518)

qnixsynapse

Thanks for the changes! Overall LGTM...

Alcpz

Changes LGTM as well, thanks for adding the performance numbers.

NeoZhangJianyu · Mar 4, 2025

llama.cpp use the official release of oneAPI (including oneDNN). Even if the PR of oneDNN is merged, the oneAPI will include it after a long time.
So, this draft would be pending for a long time I guess.

@NeoZhangJianyu The official release of oneDNN included in the oneapi release doesn't include nvidia support, and therefore it needs to be compiled from source. The oneDNN PR addresses an issue related to nvidia builds. Additionally, the changes don't affect intel devices as an 'official' release of oneDNN can still be used.

OK!

sgeor255 · Mar 20, 2025

@Rbiessy @Alcpz @qnixsynapse @NeoZhangJianyu the oneDNN PR has been merged, so this PR is no longer a draft. Let me know if you have any more comments! :)

NeoZhangJianyu

I tested with Intel Arc 770.
No impact of performance.

github-actions bot added documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Feb 27, 2025

Rbiessy reviewed Feb 28, 2025

View reviewed changes

ggml/src/ggml-sycl/common.hpp Outdated Show resolved Hide resolved

sgeor255 force-pushed the svet/llama-onednn branch 2 times, most recently from af9f64c to 3692b1a Compare February 28, 2025 10:54

qnixsynapse reviewed Feb 28, 2025

View reviewed changes

sgeor255 force-pushed the svet/llama-onednn branch 2 times, most recently from 11bc77b to ec9f879 Compare February 28, 2025 13:41

qnixsynapse approved these changes Feb 28, 2025

View reviewed changes

Alcpz approved these changes Feb 28, 2025

View reviewed changes

sgeor255 force-pushed the svet/llama-onednn branch 3 times, most recently from a846423 to 5a3c158 Compare March 3, 2025 16:23

sycl: cleanup oneDNN related code

5bb51a8

sgeor255 force-pushed the svet/llama-onednn branch from 5a3c158 to 5bb51a8 Compare March 3, 2025 17:21

sgeor255 marked this pull request as ready for review March 20, 2025 15:49

Rbiessy approved these changes Mar 20, 2025

View reviewed changes

NeoZhangJianyu approved these changes Mar 21, 2025

View reviewed changes

NeoZhangJianyu merged commit 9ffcc9e into ggml-org:master Mar 21, 2025
47 checks passed

Ivy233 pushed a commit to Ivy233/llama.cpp that referenced this pull request Mar 23, 2025

sycl: cleanup oneDNN related code (ggml-org#12097)

04762de

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sycl: cleanup oneDNN related code #12097

sycl: cleanup oneDNN related code #12097

Uh oh!

sgeor255 commented Feb 27, 2025 •

edited

Loading

Uh oh!

Rbiessy left a comment

Uh oh!

Uh oh!

NeoZhangJianyu commented Feb 28, 2025

Uh oh!

sgeor255 commented Feb 28, 2025

Uh oh!

qnixsynapse Feb 28, 2025

Uh oh!

sgeor255 Feb 28, 2025

Uh oh!

qnixsynapse Feb 28, 2025

Uh oh!

sgeor255 Feb 28, 2025

Uh oh!

sgeor255 commented Feb 28, 2025

Uh oh!

qnixsynapse left a comment •

edited

Loading

Uh oh!

Alcpz left a comment •

edited

Loading

Uh oh!

NeoZhangJianyu commented Mar 4, 2025

Uh oh!

sgeor255 commented Mar 20, 2025

Uh oh!

NeoZhangJianyu left a comment

Uh oh!

Uh oh!

Uh oh!

Search code, repositories, users, issues, pull requests...

sycl: cleanup oneDNN related code #12097

sycl: cleanup oneDNN related code #12097

Uh oh!

Conversation

sgeor255 commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rbiessy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NeoZhangJianyu commented Feb 28, 2025

Uh oh!

sgeor255 commented Feb 28, 2025

Uh oh!

qnixsynapse Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

sgeor255 Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

qnixsynapse Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

sgeor255 Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

sgeor255 commented Feb 28, 2025

Uh oh!

qnixsynapse left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Alcpz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NeoZhangJianyu commented Mar 4, 2025

Uh oh!

sgeor255 commented Mar 20, 2025

Uh oh!

NeoZhangJianyu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sgeor255 commented Feb 27, 2025 •

edited

Loading

qnixsynapse left a comment •

edited

Loading

Alcpz left a comment •

edited

Loading