Support for NUMA

The numa feature of llama.cpp does not seem to be supported, resulting in significant performance degradation on servers with multiple numa nodes.

Additional Context
NUMA support
--numa: Attempt optimizations that help on some systems with non-uniform memory access. This currently consists of pinning an equal proportion of the threads to the cores on each NUMA node, and disabling prefetch and readahead for mmap. The latter causes mapped pages to be faulted in on first access instead of all at once, and in combination with pinning threads to NUMA nodes, more of the pages end up on the NUMA node where they are used. Note that if the model is already in the system page cache, for example because of a previous run without this option, this will have little effect unless you drop the page cache first. This can be done by rebooting the system or on Linux by writing '3' to '/proc/sys/vm/drop_caches' as root.
https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md

This downstream project needs it.
oobabooga/text-generation-webui#3444

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for NUMA #571

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Search code, repositories, users, issues, pull requests...

Support for NUMA #571

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions