Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
Discussion options

You can now deploy any GGUF model on your own endpoint, in just a few clicks!

Simply select GGUF, select hardware configuration and done! An endpoint powered by llama-server (built from master branch) will be deployed automatically. It works with all llama.cpp-compatible models, with all size, from 0.1B up to 405B parameters.

Try it now --> https://ui.endpoints.huggingface.co/

And the best part is:

@ggerganov: ggml.ai will be receiving a revenue share from all llama.cpp-powered endpoints used on HF. So for anyone who wants to support us, make sure to give those endpoints a try ♥️

A huge thanks to @ggerganov @slaren and @huggingface team for making this possible!

llama.hfe.ok.mp4
You must be logged in to vote

Replies: 1 comment

Comment options

ngxson
Sep 27, 2024
Collaborator Author

Hermes 405B model can be deployed on 2xA100. The generation speed is around 8t/s, which is not bad!

Screenshot 2024-09-27 at 14 26 50
You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
1 participant
Morty Proxy This is a proxified and sanitized view of the page, visit original site.