Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Releases: ggml-org/llama.cpp

b5604

07 Jun 13:43
228f34c
Compare
Choose a tag to compare
Loading
SYCL: Implement few same quantized type copy kernels (#13739)

* SYCL: Implement few same quantized type copy kernels

* Use memcpy for copying contiguous tensors

ggml-ci

* feat(sycl): add contiguous tensor copy support and device checks

Adds a memcpy path for contiguous tensors of the same type to optimize data transfer. Updates device support checks to recognize contiguous tensor operations, improving compatibility and performance.

* refactor: replace specific block copy functions with template

The changes replace multiple redundant block copy functions (e.g., cpy_block_q8_0_q8_0, cpy_block_q5_0_q5_0) with a single templated function cpy_blck_q_q. This reduces code duplication by using a generic template that works for any block type, improving maintainability while preserving the same functionality. The template is instantiated with specific block types (e.g., block_q8_0) where needed.

* Exclude BF16 support for COPY tensors for now
ggml-ci

* perf: adjust SYCL copy kernel block sizes for efficiency

Use ceil_div to ensure full element coverage and update nd_range parameters to better align with SYCL block sizes, improving parallelism and device utilization in copy operations.

b5603

07 Jun 13:13
0974ad7
Compare
Choose a tag to compare
Loading
llama : fix llama_model_chat_template with template name (LLM_KV with…

b5602

06 Jun 13:05
745aa53
Compare
Choose a tag to compare
Loading
llama : deprecate llama_kv_self_ API (#14030)

* llama : deprecate llama_kv_self_ API

ggml-ci

* llama : allow llama_memory_(nullptr)

ggml-ci

* memory : add flag for optional data clear in llama_memory_clear

ggml-ci

b5601

06 Jun 12:13
487a5e0
Compare
Choose a tag to compare
Loading
context : fix SWA-related warning for multiple sequences (#14045)

b5600

06 Jun 08:02
d17a809
Compare
Choose a tag to compare
Loading
llama : support multiple classifier outputs and labels (#13940)

b5598

05 Jun 14:39
669c13e
Compare
Choose a tag to compare
Loading
vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs…

b5596

05 Jun 13:13
7f37b6c
Compare
Choose a tag to compare
Loading
memory : migrate from llama_kv_cache to more generic llama_memory (#1…

b5595

05 Jun 10:16
3a07714
Compare
Choose a tag to compare
Loading
llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WI…

b5593

05 Jun 07:48
9f47fa5
Compare
Choose a tag to compare
Loading
vocab : warn about missing mask token (#14022)

b5592

05 Jun 06:43
9e31bec
Compare
Choose a tag to compare
Loading
context : fix pos_min initialization upon error decode (#14008)

ggml-ci
Morty Proxy This is a proxified and sanitized view of the page, visit original site.