Releases · ishandutta2007/llama.cpp

18 Oct 05:32

8138785

b6792 Latest

Latest

opencl: transposed gemm/gemv moe kernel with mxfp4,f32 (#16602)

* opencl: transposed gemm/gemv moe kernel with mxfp4,f32

* add restore kernel for moe transpose

* fix trailing whitespaces

* resolve compilation warnings

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-10-18T05:32:21Z
llama-b6792-bin-macos-arm64.zip

sha256:ffd1677eb2ed88ce8d122f1539eee6b3490ca9d4f86c3d7a16318eb367122b3f

10.4 MB 2025-10-18T05:32:32Z
llama-b6792-bin-macos-x64.zip

sha256:5b610357ba0f0d32ffacefa82773fcb1162089483d19d8a22bd012b318f70164

27 MB 2025-10-18T05:32:33Z
llama-b6792-bin-ubuntu-vulkan-x64.zip

sha256:721cf1548c70dab14cd9d24edc13ae897997709a7bb675eb6e37f81e1c5428af

25.9 MB 2025-10-18T05:32:34Z
llama-b6792-bin-ubuntu-x64.zip

sha256:9fb001e4356bc46c45386b0df2569c15acf009a0d25cd054216ea27bd5173c2f

12.5 MB 2025-10-18T05:32:35Z
llama-b6792-bin-win-cpu-arm64.zip

sha256:2ab9ff173be1837e121e7483dde02c8b1d01d776de555071ff3d17235888565c

10.6 MB 2025-10-18T05:32:36Z
llama-b6792-bin-win-cpu-x64.zip

sha256:4563f511dd7a61d4ac445796f5384c2f0442f6afe9ceca138c8e3c75f8c2d0b2

13.7 MB 2025-10-18T05:32:37Z
llama-b6792-bin-win-cuda-12.4-x64.zip

sha256:da9984b9473834ef7cb831aecbb8c2f67914a7a25b3c706f52a70bcf6e7fbaae

169 MB 2025-10-18T05:32:38Z
llama-b6792-bin-win-hip-radeon-x64.zip

sha256:9fb516b65a0e3a3ef20dd03a6b9166a97e263555ba66253ece976365da3d34bd

321 MB 2025-10-18T05:32:43Z
llama-b6792-bin-win-opencl-adreno-arm64.zip

sha256:9da397c2248725ef1103b353c10091ecbe4e14f30afbcaaf3c9eafb8771d9680

11 MB 2025-10-18T05:32:51Z
Source code (zip)

2025-10-18T00:55:32Z
Source code (tar.gz)

2025-10-18T00:55:32Z

17 Oct 18:04

github-actions

b6791

66b0dbc

b6791

llama-model: fix insonsistent ctxs <-> bufs order (#16581)

Assets 15

17 Oct 12:45

github-actions

b6788

342c728

b6788

ggml : fix SpaceMit IME array out-of-bounds in task assignment (#16629)

Fix incorrect task-to-batch index calculation in the quantization phase.

The bug caused out-of-bounds access to qnbitgemm_args array when
compute_idx exceeded per_gemm_block_count_m, leading to invalid
pointer dereferences and SIGBUS errors.

Correctly map tasks to batches by dividing compute_idx by
per_gemm_block_count_m instead of block_size_m.

Example:
  batch_feature=1, gemm_m=30, block_size_m=4
  per_gemm_block_count_m = 8, task_count = 8

  Old: gemm_idx = 4/4 = 1 (out of bounds  New: gemm_idx = 4/8 = 0 (correct)

Tested on SpaceMit K1 RISC-V64 with qwen2.5:0.5b model.

Co-authored-by: muggle <mingjun.rong@spacemit.com>

Assets 15

17 Oct 05:36

github-actions

b6783

ceff6bb

b6783

SYCL SET operator optimized for F32 tensors (#16350)

* SYCL/SET: implement operator + wire-up; docs/ops updates; element_wise & ggml-sycl changes

* sycl(SET): re-apply post-rebase; revert manual docs/ops.md; style cleanups

* move SET op to standalone file, GPU-only implementation

* Update SYCL SET operator for F32

* ci: fix editorconfig issues (LF endings, trailing spaces, final newline)

* fixed ggml-sycl.cpp

---------

Co-authored-by: Gitty Burstein <gitty@example.com>

Assets 15

17 Oct 00:46

github-actions

b6782

1bb4f43

b6782

mtmd : support home-cooked Mistral Small Omni (#14928)

Assets 15

16 Oct 19:16

github-actions

b6781

683fa6b

b6781

fix: added a normalization step for MathJax-style \[\] and \(\) delim…

Assets 9

16 Oct 11:54

github-actions

b6779

7a50cf3

b6779

CANN: format code using .clang-format (#15863)

This commit applies .clang-format rules to all source files under the
ggml-cann directory to ensure consistent coding style and readability.
The .clang-format option `SortIncludes: false` has been set to disable
automatic reordering of include directives.
No functional changes are introduced.

Co-authored-by: hipudding <huafengchun@gmail.com>

Assets 15

16 Oct 08:06

github-actions

b6776

ee50ee1

b6776

SYCL: Add GGML_OP_MEAN operator support (#16009)

* SYCL: Add GGML_OP_MEAN operator support

* SYCL: Fix formatting for GGML_OP_MEAN case

* Update ggml/src/ggml-sycl/ggml-sycl.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Assets 15

16 Oct 04:19

github-actions

b6775

7adc79c

b6775

gguf-py : add support for endian conversion of BF16 data (#16594)

BF16 requires special handling in this script
while it's a 2-bytes data, but view is 1-byte by default.
Switch to correct view before attempting byteswapping.

With this change correctly byteswapping models like
Meta-Llama-3-8B-Instruct-bf16-GGUF
should be possible.

Assets 15

15 Oct 17:36

github-actions

b6771

f9fb33f

b6771

Add server-driven parameter defaults and syncing (#16515)

Assets 15

Previous 1 2 3 4 5 … 60 61 Next

Previous Next

Search code, repositories, users, issues, pull requests...

Releases: ishandutta2007/llama.cpp

b6792

Uh oh!

b6791

Uh oh!

b6788

Uh oh!

b6783

Uh oh!

b6782

Uh oh!

b6781

Uh oh!

b6779

Uh oh!

b6776

Uh oh!

b6775

Uh oh!

b6771

Uh oh!