Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Latest commit

 

History

History
History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Outline

Awesome WebLLM

This page contains a curated list of examples, tutorials, blogs about WebLLM usecases. Please send a pull request if you find things that belong here.

Example Projects

Note that all examples below run in-browser and use WebGPU as a backend.

Project List

  • get-started: minimum get started example with chat completion.

    Open in JSFiddle Open in Codepen

  • simple-chat-js: a mininum and complete chat bot app in vanilla JavaScript.

    Open in JSFiddle Open in Codepen

  • simple-chat-ts: a mininum and complete chat bot app in TypeScript.

  • get-started-web-worker: same as get-started, but using web worker.

  • next-simple-chat: a mininum and complete chat bot app with Next.js.

  • subgroups-usage: capability-based routing between baseline and subgroup WebGPU WASM builds.

  • multi-round-chat: while APIs are functional, we internally optimize so that multi round chat usage can reuse KV cache

  • text-completion: demonstrates API engine.completions.create(), which is pure text completion with no conversation, as opposed to engine.chat.completions.create()

  • embeddings: demonstrates API engine.embeddings.create(), integration with EmbeddingsInterface and MemoryVectorStore of Langchain.js, and RAG with Langchain.js using WebLLM for both LLM and Embedding in a single engine

  • multi-models: demonstrates loading multiple models in a single engine concurrently

Advanced OpenAI API Capabilities

These examples demonstrate various capabilities via WebLLM's OpenAI-like API.

  • streaming: return output as chunks in real-time in the form of an AsyncGenerator
  • json-mode: efficiently ensure output is in json format, see OpenAI Reference for more.
  • json-schema: besides guaranteeing output to be in JSON, ensure output to adhere to a specific JSON schema specified the user
  • seed-to-reproduce: use seeding to ensure reproducible output with fields seed.
  • function-calling (WIP): function calling with fields tools and tool_choice (with preliminary support).
  • vision-model: process request with image input using Vision Language Model (e.g. Phi3.5-vision)

Chrome Extension

Others

  • logit-processor: while logit_bias is supported, we additionally support stateful logit processing where users can specify their own rules. We also expose low-level API forwardTokensAndSample().
  • cache-usage: demonstrates how WebLLM supports multiple cache backends. Choose between the Cache API, IndexedDB cache, OPFS, or the experimental Chrome Cross-Origin Storage extension via appConfig.cacheBackend. Also demonstrates various cache utils such as checking whether a model is cached, deleting a model's weights from cache, deleting a model library wasm from cache, etc. Note: cross-origin backend currently does not support programmatic tensor-cache deletion.
  • simple-chat-upload: demonstrates how to upload local models to WebLLM instead of downloading via a URL link

Demo Spaces

  • web-llm-embed: document chat prototype using react-llm with transformers.js embeddings
  • DeVinci: AI chat app based on WebLLM and hosted on decentralized cloud platform
Morty Proxy This is a proxified and sanitized view of the page, visit original site.