Closed
Description
A lot of people would like to run their own server, but don't have the necessary DevOps skills to configure and build a llama-cpp-python + python + llama.cpp
environment.
I'm working on developing some Dockerfiles that are run via a Github action to publish to Docker Hub similar to llama.cpp's workflows/docker.yml for both OpenBLAS (i.e. no NVidia GPU) and CuBLAS (NVidia GPU via Docker) support.
Which CC licensed models are now available that are compatible with llama.cpp
's new quantized format? Ideally we want to start with small models to keep the Docker image sizes manageable.
Metadata
Metadata
Assignees
Labels
New feature or requestNew feature or requestHardware specific issueHardware specific issueModel specific issueModel specific issue