Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 6308f21

Browse filesBrowse files
committed
docs: Update Llama docs
1 parent f03a38e commit 6308f21
Copy full SHA for 6308f21

File tree

Expand file treeCollapse file tree

1 file changed

+15
-11
lines changed
Filter options
Expand file treeCollapse file tree

1 file changed

+15
-11
lines changed

‎llama_cpp/llama.py

Copy file name to clipboardExpand all lines: llama_cpp/llama.py
+15-11Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -798,17 +798,21 @@ def __init__(
798798
vocab_only: Only load the vocabulary no weights.
799799
use_mmap: Use mmap if possible.
800800
use_mlock: Force the system to keep the model in RAM.
801-
seed: Random seed. -1 for random.
802-
n_ctx: Context size.
803-
n_batch: Batch size for prompt processing (must be >= 32 to use BLAS)
804-
n_threads: Number of threads to use. If None, the number of threads is automatically determined.
805-
n_threads_batch: Number of threads to use for batch processing. If None, use n_threads.
806-
rope_scaling_type: Type of rope scaling to use.
807-
rope_freq_base: Base frequency for rope sampling.
808-
rope_freq_scale: Scale factor for rope sampling.
809-
mul_mat_q: if true, use experimental mul_mat_q kernels
810-
f16_kv: Use half-precision for key/value cache.
811-
logits_all: Return logits for all tokens, not just the last token.
801+
seed: RNG seed, -1 for random
802+
n_ctx: Text context, 0 = from model
803+
n_batch: Prompt processing maximum batch size
804+
n_threads: Number of threads to use for generation
805+
n_threads_batch: Number of threads to use for batch processing
806+
rope_scaling_type: RoPE scaling type, from `enum llama_rope_scaling_type`. ref: https://github.com/ggerganov/llama.cpp/pull/2054
807+
rope_freq_base: RoPE base frequency, 0 = from model
808+
rope_freq_scale: RoPE frequency scaling factor, 0 = from model
809+
yarn_ext_factor: YaRN extrapolation mix factor, negative = from model
810+
yarn_attn_factor: YaRN magnitude scaling factor
811+
yarn_beta_fast: YaRN low correction dim
812+
yarn_beta_slow: YaRN high correction dim
813+
yarn_orig_ctx: YaRN original context size
814+
f16_kv: Use fp16 for KV cache, fp32 otherwise
815+
logits_all: Return logits for all tokens, not just the last token. Must be True for completion to return logprobs.
812816
embedding: Embedding mode only.
813817
last_n_tokens_size: Maximum number of tokens to keep in the last_n_tokens deque.
814818
lora_base: Optional path to base model, useful if using a quantized base model and you want to apply LoRA to an f16 model.

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.