examples

Awesome WebLLM

This page contains a curated list of examples, tutorials, blogs about WebLLM usecases. Please send a pull request if you find things that belong here.

Example Projects

Note that all examples below run in-browser and use WebGPU as a backend.

Project List

get-started: minimum get started example with chat completion.
simple-chat-js: a mininum and complete chat bot app in vanilla JavaScript.
simple-chat-ts: a mininum and complete chat bot app in TypeScript.
get-started-web-worker: same as get-started, but using web worker.
next-simple-chat: a mininum and complete chat bot app with Next.js.
subgroups-usage: capability-based routing between baseline and subgroup WebGPU WASM builds.
multi-round-chat: while APIs are functional, we internally optimize so that multi round chat usage can reuse KV cache
text-completion: demonstrates API engine.completions.create(), which is pure text completion with no conversation, as opposed to engine.chat.completions.create()
embeddings: demonstrates API engine.embeddings.create(), integration with EmbeddingsInterface and MemoryVectorStore of Langchain.js, and RAG with Langchain.js using WebLLM for both LLM and Embedding in a single engine
multi-models: demonstrates loading multiple models in a single engine concurrently

Advanced OpenAI API Capabilities

These examples demonstrate various capabilities via WebLLM's OpenAI-like API.

streaming: return output as chunks in real-time in the form of an AsyncGenerator
json-mode: efficiently ensure output is in json format, see OpenAI Reference for more.
json-schema: besides guaranteeing output to be in JSON, ensure output to adhere to a specific JSON schema specified the user
seed-to-reproduce: use seeding to ensure reproducible output with fields seed.
function-calling (WIP): function calling with fields tools and tool_choice (with preliminary support).
vision-model: process request with image input using Vision Language Model (e.g. Phi3.5-vision)

Chrome Extension

chrome-extension: chrome extension that does not have a persistent background
chrome-extension-webgpu-service-worker: chrome extension using service worker, hence having a persistent background

Others

logit-processor: while logit_bias is supported, we additionally support stateful logit processing where users can specify their own rules. We also expose low-level API forwardTokensAndSample().
cache-usage: demonstrates how WebLLM supports multiple cache backends. Choose between the Cache API, IndexedDB cache, OPFS, or the experimental Chrome Cross-Origin Storage extension via appConfig.cacheBackend. Also demonstrates various cache utils such as checking whether a model is cached, deleting a model's weights from cache, deleting a model library wasm from cache, etc. Note: cross-origin backend currently does not support programmatic tensor-cache deletion.
simple-chat-upload: demonstrates how to upload local models to WebLLM instead of downloading via a URL link

Demo Spaces

web-llm-embed: document chat prototype using react-llm with transformers.js embeddings
DeVinci: AI chat app based on WebLLM and hosted on decentralized cloud platform

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand file tree

README.md

Awesome WebLLM

Example Projects

Project List

Advanced OpenAI API Capabilities

Chrome Extension

Others

Demo Spaces

Name	Name	Last commit message	Last commit date
parent directory ..
abort-reload	abort-reload
cache-usage	cache-usage
chrome-extension-webgpu-service-worker	chrome-extension-webgpu-service-worker
chrome-extension	chrome-extension
embeddings	embeddings
function-calling	function-calling
get-started-latency-breakdown	get-started-latency-breakdown
get-started-web-worker	get-started-web-worker
get-started	get-started
integrity-verification	integrity-verification
json-mode	json-mode
json-schema	json-schema
logit-processor	logit-processor
multi-models	multi-models
multi-round-chat	multi-round-chat
next-simple-chat	next-simple-chat
qwen3	qwen3
seed-to-reproduce	seed-to-reproduce
service-worker	service-worker
simple-chat-js	simple-chat-js
simple-chat-ts	simple-chat-ts
simple-chat-upload	simple-chat-upload
streaming	streaming
structural-tag-tool-use	structural-tag-tool-use
subgroups-usage	subgroups-usage
text-completion	text-completion
vision-model	vision-model
.gitignore	.gitignore
README.md	README.md

Search code, repositories, users, issues, pull requests...

FilesExpand file tree

examples

Directory actions

More options

Directory actions

More options

Latest commit

History

examples

Folders and files

parent directory

README.md

Awesome WebLLM

Example Projects

Project List

Advanced OpenAI API Capabilities

Chrome Extension

Others

Demo Spaces

Expand file tree