Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

tensorfuse/fastpull

Open more actions menu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TensorFuse Logo TensorFuse Logo

Start massive AI/ML container images 10x faster with lazy-loading snapshotter

Join Slack Read our Blog

InstallationResultsDetailed Usage


What is Fastpull?

Fastpull is a lazy-loading snapshotter that starts massive AI/ML container images (>10 GB) in seconds.

The Cold Start Problem

AI/ML container images like CUDA, vLLM, and sglang are large (10 GB+). Traditional Docker pulls take 7-10 minutes, causing:

  • 20-30% GPU capacity wasted from overprovisioning
  • SLA breaches during traffic spikes

The Solution

Fastpull uses lazy-loading to pull only the files needed to start the container, then fetches remaining layers on demand. This accelerates start times by 10x. See the results below:

benchmark

You can now:

For more information, check out the fastpull blog release.


Install fastpull on a VM

Prerequisites

Installation Steps

1. Install fastpull

git clone https://github.com/tensorfuse/fastpull.git
cd fastpull/
sudo python3 scripts/setup.py

You should see: "✅ Fastpull installed successfully on your VM"

2. Run containers

Fastpull requires your images to be in a special format. You can either choose from our template of pre-built images like vLLM, TensorRT, and SGlang or build your own using a Dockerfile.

Use pre-built images

Test with vLLM, TensorRT, or Sglang:

fastpull quickstart tensorrt
fastpull quickstart vllm
fastpull quickstart sglang

Each of these will run two times, once with fastpull optimisations, and one the way docker runs it After the quickstart runs are complete, we also run fastpull clean --all which cleans up the downloaded images.

Build custom images

First, authenticate with your registry For ECR:

aws configure;
aws ecr get-login-password --region us-east-1 | sudo nerdctl login --username AWS --password-stdin ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com

For GAR:

gcloud auth login;
gcloud auth print-access-token | sudo nerdctl login <REGION>-docker.pkg.dev --username oauth2accesstoken --password-stdin

For Dockerhub:

sudo docker login

Build and push from your Dockerfile:

Note

  • We support --registry gar, --registry ecr, --registry dockerhub
  • For <TAG>, you can use any name that's convenient, ex: v1, latest
  • 2 images are created, one is the overlayfs with tag:<TAG> and another is the fastpull image with tag: <TAG>-fastpull
# Build and push image
fastpull build --registry <REGISTRY> --dockerfile-path <DOCKERFILE-PATH> --repository-url <ECR/GAR-REPO-URL>:<TAG>

Benchmarking with Fastpull

To get the run time for your container, you can use either:

Completion Time

Use if the workload has a defined end point

fastpull run --benchmark-mode completion [--FLAGS] <REPO-URL>:<TAG>
fastpull run --benchmark-mode completion --mode normal [--FLAGS] <REPO-URL>:<TAG>

Server Endpoint Readiness Time

Use if you're preparing a server, and it send with a 200 SUCCESS response once the server is up

fastpull run --benchmark-mode readiness --readiness-endpoint localhost:<PORT>/<ENDPOINT> [--FLAGS] <REPO-URL>:<TAG>
fastpull run --benchmark-mode readiness --readiness-endpoint localhost:<PORT>/<ENDPOINT> --model normal [--FLAGS] <REPO-URL>:<TAG>

Note

  • When running for Readiness, you must publish the right port ex. -p 8000:8000 and use --readiness-endpoint localhost:8000/health
  • Use --mode normal to run normal docker, running without this flag runs with fastpull optimisations
  • For [--FLAGS] you can use any docker compatible flags, ex. --gpus all, -p PORT:PORT, -v <VOLUME_MOUNT>
  • If using GPUs, make sure you add --gpus all as a fastpull run flag

Cleaning after a run

To get the right cold start numbers, run the clean command after each run:

fastpull clean --all

Understanding Test Results

Results show the startup and completion/readiness times:

Example Output

==================================================
BENCHMARK SUMMARY
==================================================
Time to Container Start: 141.295s
Time to Readiness:       329.367s
Total Elapsed Time:      329.367s
==================================================

Install fastpull on a Kubernetes Cluster

Prerequisites

  • Tested on GKE
  • Tested with COS Operating System for the nodes

Installation

  1. In your K8s cluster, create a GPU Nodepool. For GKE, ensure Workload Identity is enabled on your cluster
  2. Install Nvidia GPU drivers. For COS:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded-latest.yaml
  1. Install containerd config updater daemonset: kubectl apply -f https://raw.githubusercontent.com/tensorfuse/fastpull-gke/main/containerd-daemonset.yaml
  2. Install the Helm Chart. For COS:
helm upgrade --install fastpull-snapshotter oci://registry-1.docker.io/tensorfuse/fastpull-snapshotter \
--version 0.0.10-gke-helm \
--create-namespace \
--namespace fastpull-snapshotter \
--set 'tolerations[0].key=nvidia.com/gpu' \
--set 'tolerations[0].operator=Equal' \
--set 'tolerations[0].value=present' \
--set 'tolerations[0].effect=NoSchedule' \
--set 'affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[0].matchExpressions[0].key=cloud.google.com/gke-accelerator' \
--set 'affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[0].matchExpressions[0].operator=Exists'
  1. Build your images, which can be done by two ways:

    a. On a standalone VM, preferably using Ubuntu os, install fastpull and build your image

    b. Build in a container:

    First authenticate to your registry and ensure the ~/docker/config.json is updated

    #for aws
    aws configure
    aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com
    #for gcp
    gcloud auth login
    gcloud auth print-access-token | sudo nerdctl login <REGION>-docker.pkg.dev --username oauth2accesstoken --password-stdin

    Then build using our image:

    docker run --rm --privileged \
      -v /path/to/dockerfile-dir:/workspace:ro \
      -v ~/.docker/config.json:/root/.docker/config.json:ro \
      tensorfuse/fastpull-builder:latest \
      REGISTRY/REPO/IMAGE:TAG

    This creates IMAGE:TAG (normal) and IMAGE:TAG-fastpull (fastpull-optimized). Use the -fastpull tag in your pod spec. See builder documentation for details.

  2. Create the pod spec for image we created. For COS, use a pod spec like this:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-test-a100-fastpull
spec:
  tolerations:
    - operator: Exists
  nodeSelector:
    cloud.google.com/gke-accelerator: nvidia-tesla-a100 # Use your GPU Type
  runtimeClassName: runc-fastpull
  containers:
  - name: debug-container
    image: IMAGE_PATH:<TAG>-fastpull # USE FASTPULL IMAGE
    resources:
      limits:
        nvidia.com/gpu: 1
    env:
    - name: LD_LIBRARY_PATH
      value: /usr/local/cuda/lib64:/usr/local/nvidia/lib64 # NOTE: This path may vary depending on the base image
  1. Run a pod with this spec:
kubectl apply -f <POD-SPECFILE>.yaml

🤝 Contributing

We welcome contributions! Submit a Pull Request or join our Slack community.


Built with ❤️ by the TensorFuse team

License: MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  
Morty Proxy This is a proxified and sanitized view of the page, visit original site.