Deploying-Llama-3.3-70B

Deploy and serve Llama 3.3 70B with AWQ quantization using vLLM and BentoML.

Clone the repository

git clone https://github.com/kingabzpro/Deploying-Llama-3.3-70B.git
cd Deploying-Llama-3.3-70B.git

Deployment

Install dependencies:

pip install -r requirements.txt

Logged in to BentoCloud:

bentoml cloud login

Deploy the model:

bentoml deploy .

Inference

The model can be accessed via CURL command, BentoML Python Client, or OpenAI Python client.:

from openai import OpenAI

client = OpenAI(base_url="<BentoCloud endpoint>", api_key="<Your BentoCloud API key>")

chat_completion = client.chat.completions.create(
    model="casperhansen/llama-3.3-70b-instruct-awq",
    messages=[
        {
            "role": "user",
            "content": "What is a black hole and how does it work?"
        }
    ],
    stream=True,
	stop=["<|eot_id|>", "<|end_of_text|>"],
)
for chunk in chat_completion:
    print(chunk.choices[0].delta.content or "", end="")

Name	Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md
app.py	app.py
bentofile.yaml	bentofile.yaml
inference.py	inference.py
inference.sh	inference.sh
requirements.txt	requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Deploying-Llama-3.3-70B

Clone the repository

Deployment

Inference

About

Uh oh!

Uh oh!

Languages

Search code, repositories, users, issues, pull requests...

License

kingabzpro/Deploying-Llama-3.3-70B

Folders and files

Latest commit

History

Repository files navigation

Deploying-Llama-3.3-70B

Clone the repository

Deployment

Inference

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages