We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 66fb034 commit d018c7bCopy full SHA for d018c7b
llama_cpp/llama.py
@@ -239,6 +239,7 @@ def __init__(
239
n_ctx: Maximum context size.
240
n_parts: Number of parts to split the model into. If -1, the number of parts is automatically determined.
241
seed: Random seed. -1 for random.
242
+ n_gpu_layers: Number of layers to offload to GPU (-ngl). If -1, all layers are offloaded.
243
f16_kv: Use half-precision for key/value cache.
244
logits_all: Return logits for all tokens, not just the last token.
245
vocab_only: Only load the vocabulary no weights.
0 commit comments