feat: add support for MoE offloading #12333

Readon · Sep 18, 2025

This commit introduces a new parameter num_moe_offload to the Modelfile, allowing users to offload Mixture-of-Experts (MoE) weights to the CPU to reduce VRAM usage.

The num_moe_offload parameter can be set to:

A positive integer N to offload the first N MoE layers.
-1 to offload all MoE layers.
0 (default) to disable offloading.

This is implemented by passing tensor override rules to the underlying llama.cpp library, which already supports this functionality. The documentation for the new parameter has also been updated.

Try to use Jules to solve the #11772

This commit introduces a new parameter `num_moe_offload` to the Modelfile, allowing users to offload Mixture-of-Experts (MoE) weights to the CPU to reduce VRAM usage. The `num_moe_offload` parameter can be set to: - A positive integer `N` to offload the first `N` MoE layers. - `-1` to offload all MoE layers. - `0` (default) to disable offloading. This is implemented by passing tensor override rules to the underlying `llama.cpp` library, which already supports this functionality. The documentation for the new parameter has also been updated.

jessegross · Sep 18, 2025

Thank you but we want to be able to configure this automatically based on available memory rather than making the user configure it, similar to how the rest of Ollama works. I also don't think it handles multiple GPUs properly.

In addition, generally we are not adding new features to the old llama engine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add support for MoE offloading #12333

feat: add support for MoE offloading #12333

Readon commented Sep 18, 2025

Uh oh!

jessegross commented Sep 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Search code, repositories, users, issues, pull requests...

feat: add support for MoE offloading #12333

Are you sure you want to change the base?

feat: add support for MoE offloading #12333

Conversation

Readon commented Sep 18, 2025

Uh oh!

jessegross commented Sep 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants