Releases · ggml-org/llama.cpp

07 Jun 13:43

228f34c

b5604 Latest

Latest

SYCL: Implement few same quantized type copy kernels (#13739)

* SYCL: Implement few same quantized type copy kernels

* Use memcpy for copying contiguous tensors

ggml-ci

* feat(sycl): add contiguous tensor copy support and device checks

Adds a memcpy path for contiguous tensors of the same type to optimize data transfer. Updates device support checks to recognize contiguous tensor operations, improving compatibility and performance.

* refactor: replace specific block copy functions with template

The changes replace multiple redundant block copy functions (e.g., cpy_block_q8_0_q8_0, cpy_block_q5_0_q5_0) with a single templated function cpy_blck_q_q. This reduces code duplication by using a generic template that works for any block type, improving maintainability while preserving the same functionality. The template is instantiated with specific block types (e.g., block_q8_0) where needed.

* Exclude BF16 support for COPY tensors for now
ggml-ci

* perf: adjust SYCL copy kernel block sizes for efficiency

Use ceil_div to ensure full element coverage and update nd_range parameters to better align with SYCL block sizes, improving parallelism and device utilization in copy operations.

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6
373 MB 2025-06-07T13:43:14Z
llama-b5604-bin-macos-arm64.zip

sha256:673c5f8c4a7a2226ca6f3db5141d1ed78dcdf178d902c1486b77b3dda41e6a61
10.4 MB 2025-06-07T13:43:26Z
llama-b5604-bin-macos-x64.zip

sha256:ab9ba59f3e355aebe6bc2c632b3489d4ab98d727f0ee03b9df6b4a6207452ad2
25.3 MB 2025-06-07T13:43:27Z
llama-b5604-bin-ubuntu-vulkan-x64.zip

sha256:9200827b90099c31079af926eecc9123f579410d57b23e10504666e0b77052d7
19.7 MB 2025-06-07T13:43:28Z
llama-b5604-bin-ubuntu-x64.zip

sha256:3981c1e353399fbc35ea23316854b3c302877c00fb6b7fa4063a0623ad9e0309
12 MB 2025-06-07T13:43:29Z
llama-b5604-bin-win-cpu-arm64.zip

sha256:09b5a20fe0165e9bae573e7c90995b75b90e880e6067c9a80e091729aed6f0c9
10.7 MB 2025-06-07T13:43:30Z
llama-b5604-bin-win-cpu-x64.zip

sha256:df32d58de0b57c3c9de2e68e44b947efbac1b8a3e417536b6703cbd8da11eddd
13.3 MB 2025-06-07T13:43:31Z
llama-b5604-bin-win-cuda-12.4-x64.zip

sha256:150a17ddee7162c4f46e4d3ffb9901a95f2d048ce538480e8306f84fadded8d4
126 MB 2025-06-07T13:43:32Z
llama-b5604-bin-win-hip-radeon-x64.zip

sha256:35537bc0f0138c2dffb6611c5eb42b954662b9a3a3628fadf9a124e47f4a81ae
297 MB 2025-06-07T13:43:36Z
llama-b5604-bin-win-opencl-adreno-arm64.zip

sha256:566629f655058ea696b6dc93e756684f4938b6207189893d340324bec0141ed1
11 MB 2025-06-07T13:43:44Z
Source code (zip)

2025-06-07T13:28:20Z
Source code (tar.gz)

2025-06-07T13:28:20Z

07 Jun 13:13

github-actions

b5603

0974ad7

b5603

llama : fix llama_model_chat_template with template name (LLM_KV with…

Assets 15

06 Jun 13:05

github-actions

b5602

745aa53

b5602

llama : deprecate llama_kv_self_ API (#14030)

* llama : deprecate llama_kv_self_ API

ggml-ci

* llama : allow llama_memory_(nullptr)

ggml-ci

* memory : add flag for optional data clear in llama_memory_clear

ggml-ci

Assets 15

06 Jun 12:13

github-actions

b5601

487a5e0

b5601

context : fix SWA-related warning for multiple sequences (#14045)

Assets 15

06 Jun 08:02

github-actions

b5600

d17a809

b5600

llama : support multiple classifier outputs and labels (#13940)

Assets 15

05 Jun 14:39

github-actions

b5598

669c13e

b5598

vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs…

Assets 15

05 Jun 13:13

github-actions

b5596

7f37b6c

b5596

memory : migrate from llama_kv_cache to more generic llama_memory (#1…

Assets 15

05 Jun 10:16

github-actions

b5595

3a07714

b5595

llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WI…

Assets 15

05 Jun 07:48

github-actions

b5593

9f47fa5

b5593

vocab : warn about missing mask token (#14022)

Assets 15

05 Jun 06:43

github-actions

b5592

9e31bec

b5592

context : fix pos_min initialization upon error decode (#14008)

ggml-ci

Assets 15

Previous 1 2 3 4 5 … 99 100 Next

Previous Next

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Search code, repositories, users, issues, pull requests...

Releases: ggml-org/llama.cpp

b5604

Uh oh!

b5603

Uh oh!

b5602

Uh oh!

b5601

Uh oh!

b5600

Uh oh!

b5598

Uh oh!

b5596

Uh oh!

b5595

Uh oh!

b5593

Uh oh!

b5592

Uh oh!