Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Support for NUMA #571

Copy link
Copy link
Closed
Closed
Copy link
@rankaiyx

Description

@rankaiyx
Issue body actions

The numa feature of llama.cpp does not seem to be supported, resulting in significant performance degradation on servers with multiple numa nodes.

Additional Context
NUMA support
--numa: Attempt optimizations that help on some systems with non-uniform memory access. This currently consists of pinning an equal proportion of the threads to the cores on each NUMA node, and disabling prefetch and readahead for mmap. The latter causes mapped pages to be faulted in on first access instead of all at once, and in combination with pinning threads to NUMA nodes, more of the pages end up on the NUMA node where they are used. Note that if the model is already in the system page cache, for example because of a previous run without this option, this will have little effect unless you drop the page cache first. This can be done by rebooting the system or on Linux by writing '3' to '/proc/sys/vm/drop_caches' as root.
https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md

This downstream project needs it.
oobabooga/text-generation-webui#3444

StoyanStAtanasov and firengate

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.