model-serving

Star

Here are 200 public repositories matching this topic...

vllm-project / vllm

Sponsor

Star

A high-throughput and memory-efficient inference and serving engine for LLMs

Updated Dec 22, 2025
Python

bentoml / BentoML

Star

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering ai-inference llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated Dec 18, 2025
Python

kserve / kserve

Star

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

Updated Dec 22, 2025
Go

ahkarami / Deep-Learning-in-Production

Star

In this repository, I will share some useful notes and references about deploying deep learning-based models in production.

Updated Nov 9, 2024

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

machine-learning deep-learning inference-engine model-deployment model-serving distributed-training federated-learning mlops edge-ai ai-agent on-device-training

Updated Oct 28, 2025
Python

ModelTC / LightLLM

Star

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

nlp deep-learning llama gpt model-serving llm openai-triton

Updated Dec 22, 2025
Python

beclab / Olares

Star

Olares: An Open-Source Personal Cloud to Reclaim Your Data

kubernetes home-automation mcp homeserver self-hosted homelab personal-cloud ai-agents model-serving edge-ai home-server ai-privacy home-cloud local-ai

Updated Dec 22, 2025
Go

predibase / lorax

Star

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

transformers pytorch llama gpt lora model-serving fine-tuning llm llmops llm-serving llm-inference

Updated May 21, 2025
Python

HuaizhengZhang / AI-Infra-from-Zero-to-Hero

Star

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑‍💻 Video Tutorials.

model-serving model-training mlsys ai-infra large-language-models genai llmsys

Updated Jul 25, 2025

tensorchord / envd

Star

🏕️ Reproducible development environment for humans and agents

agent docker developer-tools development-environment hacktoberfest codex model-serving buildkit mlops mlops-workflow llmops code-agent

Updated Dec 1, 2025
Go

microsoft / aici

Star

AICI: Prompts as (Wasm) Programs

rust ai wasm inference transformer language-model model-serving wasmtime llm llmops llm-serving llm-inference llm-framework

Updated Jan 22, 2025
Rust

mlrun / mlrun

Star

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

python kubernetes workflow data-science machine-learning data-engineering model-serving mlops experiment-tracking mlops-workflow