Fastpull is a lazy-loading snapshotter that starts massive AI/ML container images (>10 GB) in seconds.
AI/ML container images like CUDA, vLLM, and sglang are large (10 GB+). Traditional Docker pulls take 7-10 minutes, causing:
- 20-30% GPU capacity wasted from overprovisioning
- SLA breaches during traffic spikes
Fastpull uses lazy-loading to pull only the files needed to start the container, then fetches remaining layers on demand. This accelerates start times by 10x. See the results below:
You can now:
For more information, check out the fastpull blog release.
- VM Image: Works on Debian 12+, Ubuntu, AL2023 VMs with GPU, mileage on other AMIs may vary.
- Python>=3.10, pip, python3-venv, Docker, CUDA drivers, Nvidia Container Toolkit installed
1. Install fastpull
git clone https://github.com/tensorfuse/fastpull.git
cd fastpull/
sudo python3 scripts/setup.pyYou should see: "✅ Fastpull installed successfully on your VM"
2. Run containers
Fastpull requires your images to be in a special format. You can either choose from our template of pre-built images like vLLM, TensorRT, and SGlang or build your own using a Dockerfile.
Test with vLLM, TensorRT, or Sglang:
fastpull quickstart tensorrt
fastpull quickstart vllm
fastpull quickstart sglangEach of these will run two times, once with fastpull optimisations, and one the way docker runs it
After the quickstart runs are complete, we also run fastpull clean --all which cleans up the downloaded images.
First, authenticate with your registry For ECR:
aws configure;
aws ecr get-login-password --region us-east-1 | sudo nerdctl login --username AWS --password-stdin ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com
For GAR:
gcloud auth login;
gcloud auth print-access-token | sudo nerdctl login <REGION>-docker.pkg.dev --username oauth2accesstoken --password-stdin
For Dockerhub:
sudo docker login
Build and push from your Dockerfile:
Note
- We support --registry gar, --registry ecr, --registry dockerhub
- For
<TAG>, you can use any name that's convenient, ex:v1,latest - 2 images are created, one is the overlayfs with tag:
<TAG>and another is the fastpull image with tag:<TAG>-fastpull
# Build and push image
fastpull build --registry <REGISTRY> --dockerfile-path <DOCKERFILE-PATH> --repository-url <ECR/GAR-REPO-URL>:<TAG>To get the run time for your container, you can use either:
Completion Time
Use if the workload has a defined end point
fastpull run --benchmark-mode completion [--FLAGS] <REPO-URL>:<TAG>
fastpull run --benchmark-mode completion --mode normal [--FLAGS] <REPO-URL>:<TAG>
Server Endpoint Readiness Time
Use if you're preparing a server, and it send with a 200 SUCCESS response once the server is up
fastpull run --benchmark-mode readiness --readiness-endpoint localhost:<PORT>/<ENDPOINT> [--FLAGS] <REPO-URL>:<TAG>
fastpull run --benchmark-mode readiness --readiness-endpoint localhost:<PORT>/<ENDPOINT> --model normal [--FLAGS] <REPO-URL>:<TAG>
Note
- When running for Readiness, you must publish the right port ex.
-p 8000:8000and use--readiness-endpoint localhost:8000/health - Use --mode normal to run normal docker, running without this flag runs with fastpull optimisations
- For
[--FLAGS]you can use any docker compatible flags, ex.--gpus all,-p PORT:PORT,-v <VOLUME_MOUNT> - If using GPUs, make sure you add
--gpus allas a fastpull run flag
To get the right cold start numbers, run the clean command after each run:
fastpull clean --all
Results show the startup and completion/readiness times:
Example Output
==================================================
BENCHMARK SUMMARY
==================================================
Time to Container Start: 141.295s
Time to Readiness: 329.367s
Total Elapsed Time: 329.367s
==================================================- Tested on GKE
- Tested with COS Operating System for the nodes
- In your K8s cluster, create a GPU Nodepool. For GKE, ensure Workload Identity is enabled on your cluster
- Install Nvidia GPU drivers. For COS:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded-latest.yaml- Install containerd config updater daemonset:
kubectl apply -f https://raw.githubusercontent.com/tensorfuse/fastpull-gke/main/containerd-daemonset.yaml - Install the Helm Chart. For COS:
helm upgrade --install fastpull-snapshotter oci://registry-1.docker.io/tensorfuse/fastpull-snapshotter \
--version 0.0.10-gke-helm \
--create-namespace \
--namespace fastpull-snapshotter \
--set 'tolerations[0].key=nvidia.com/gpu' \
--set 'tolerations[0].operator=Equal' \
--set 'tolerations[0].value=present' \
--set 'tolerations[0].effect=NoSchedule' \
--set 'affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[0].matchExpressions[0].key=cloud.google.com/gke-accelerator' \
--set 'affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[0].matchExpressions[0].operator=Exists'-
Build your images, which can be done by two ways:
a. On a standalone VM, preferably using Ubuntu os, install fastpull and build your image
b. Build in a container:
First authenticate to your registry and ensure the ~/docker/config.json is updated
#for aws aws configure aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com #for gcp gcloud auth login gcloud auth print-access-token | sudo nerdctl login <REGION>-docker.pkg.dev --username oauth2accesstoken --password-stdin
Then build using our image:
docker run --rm --privileged \ -v /path/to/dockerfile-dir:/workspace:ro \ -v ~/.docker/config.json:/root/.docker/config.json:ro \ tensorfuse/fastpull-builder:latest \ REGISTRY/REPO/IMAGE:TAGThis creates
IMAGE:TAG(normal) andIMAGE:TAG-fastpull(fastpull-optimized). Use the-fastpulltag in your pod spec. See builder documentation for details. -
Create the pod spec for image we created. For COS, use a pod spec like this:
apiVersion: v1
kind: Pod
metadata:
name: gpu-test-a100-fastpull
spec:
tolerations:
- operator: Exists
nodeSelector:
cloud.google.com/gke-accelerator: nvidia-tesla-a100 # Use your GPU Type
runtimeClassName: runc-fastpull
containers:
- name: debug-container
image: IMAGE_PATH:<TAG>-fastpull # USE FASTPULL IMAGE
resources:
limits:
nvidia.com/gpu: 1
env:
- name: LD_LIBRARY_PATH
value: /usr/local/cuda/lib64:/usr/local/nvidia/lib64 # NOTE: This path may vary depending on the base image- Run a pod with this spec:
kubectl apply -f <POD-SPECFILE>.yamlWe welcome contributions! Submit a Pull Request or join our Slack community.
Built with ❤️ by the TensorFuse team


