Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit cb40dfc

Browse filesBrowse files
ikawrakowKawrakow
andauthored
llama : only use Q6_K for output weights if tensor size is multiple of 256 (ggml-org#1932)
* Only use Q6_K for output weights if tensor size is multiple of 256 * Fixed copy/paste mistake --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
1 parent ca7c3f4 commit cb40dfc
Copy full SHA for cb40dfc

File tree

Expand file treeCollapse file tree

1 file changed

+6
-2
lines changed
Filter options
Expand file treeCollapse file tree

1 file changed

+6
-2
lines changed

‎llama.cpp

Copy file name to clipboardExpand all lines: llama.cpp
+6-2Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2495,7 +2495,7 @@ static void llama_model_quantize_internal(const std::string & fname_inp, const s
24952495
if (quantized_type == GGML_TYPE_Q2_K || quantized_type == GGML_TYPE_Q3_K || quantized_type == GGML_TYPE_Q4_K ||
24962496
quantized_type == GGML_TYPE_Q5_K || quantized_type == GGML_TYPE_Q6_K) {
24972497
int nx = tensor.ne.at(0);
2498-
int ny = tensor.ne.at(0);
2498+
int ny = tensor.ne.at(1);
24992499
if (nx % QK_K != 0 || ny % QK_K != 0) {
25002500
fprintf(stderr, "\n\n========================= Tensor sizes %d x %d are not divisible by %d\n",nx,ny,QK_K);
25012501
fprintf(stderr, "This is required to be able to use k-quants for now!\n");
@@ -2504,7 +2504,11 @@ static void llama_model_quantize_internal(const std::string & fname_inp, const s
25042504
}
25052505
}
25062506
if (tensor.name == "output.weight") {
2507-
new_type = GGML_TYPE_Q6_K;
2507+
int nx = tensor.ne.at(0);
2508+
int ny = tensor.ne.at(1);
2509+
if (nx % QK_K == 0 && ny % QK_K == 0) {
2510+
new_type = GGML_TYPE_Q6_K;
2511+
}
25082512
} else if (tensor.name.find("attention.wv.weight") != std::string::npos) {
25092513
if (ftype == LLAMA_FTYPE_MOSTLY_Q3_K_M || ftype == LLAMA_FTYPE_MOSTLY_Q2_K) new_type = GGML_TYPE_Q4_K;
25102514
else if (ftype == LLAMA_FTYPE_MOSTLY_Q3_K_L) new_type = GGML_TYPE_Q5_K;

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.