Hugging Face Inference Endpoints now supports GGUF out of the box! #9669

Sep 27, 2024

ngxson
Sep 27, 2024
Collaborator

You can now deploy any GGUF model on your own endpoint, in just a few clicks!

Simply select GGUF, select hardware configuration and done! An endpoint powered by llama-server (built from master branch) will be deployed automatically. It works with all llama.cpp-compatible models, with all size, from 0.1B up to 405B parameters.

Try it now --> https://ui.endpoints.huggingface.co/

And the best part is:

@ggerganov: ggml.ai will be receiving a revenue share from all llama.cpp-powered endpoints used on HF. So for anyone who wants to support us, make sure to give those endpoints a try ♥️

A huge thanks to @ggerganov @slaren and @huggingface team for making this possible!

llama.hfe.ok.mp4

Sep 27, 2024

ngxson
Sep 27, 2024
Collaborator Author

Hermes 405B model can be deployed on 2xA100. The generation speed is around 8t/s, which is not bad!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hugging Face Inference Endpoints now supports GGUF out of the box! #9669

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Search code, repositories, users, issues, pull requests...

Hugging Face Inference Endpoints now supports GGUF out of the box! #9669

Uh oh!

Uh oh!

ngxson Sep 27, 2024 Collaborator

Replies: 1 comment

Uh oh!

ngxson Sep 27, 2024 Collaborator Author

ngxson
Sep 27, 2024
Collaborator

ngxson
Sep 27, 2024
Collaborator Author