List of popular open-source models deployed on AWS using Tensorfuse.
You can run them directly on GPU instances or deploy using the tensorkube runtime. The model will be served as an API that will auto-scale wrt the traffic you get.
Examples are organized into folders based on the modality. Each folder contains the FAST API code for model inference, and its environment as a Dockerfile. You can deploy them in two ways:
- Launch and connect to an EC2 instance with Deep Learning AMI (it has Nvidia drivers installed).
- Install NVIDIA Container Toolkit: This is important to provide GPU access to your containers
- Build and deploy the Docker Container
- You need to have aws-cli installed and configured on your local machine with admin access and
us-east-1as the default region. Follow these steps to configure. - You need to have the quotas for the GPUs you are running. Read about GPU quotas in AWS and how to apply for them.
- Install the tensorfuse python package by running
pip install tensorkube. Then, configure the tensorkube K8S cluster on your cloud by running,tensorkube configure. More details here
To deploy the model, run the following command from the root directory of the model files:
tensorkube deploy --gpus 1 --gpu-type a10gAccess the endpoint via:
tensorkube list deploymentsNote: If you encounter issues during deployment, refer to the detailed instructions for that model in our documentation.
MIT License