Xorbits Inference (Xinference)

This page demonstrates how to use Xinference with LangChain. Xinference is a powerful and versatile library designed to serve LLMs, speech recognition models, and multimodal models, even on your laptop. With Xorbits Inference, you can effortlessly deploy and serve your or state-of-the-art built-in models using just a single command.

Installation and Setup

Xinference can be installed via pip from PyPI:

Copy

pip install "xinference[all]"

LLM

Xinference supports various models compatible with GGML, including chatglm, baichuan, whisper, vicuna, and orca. To view the built-in models, run the command:

Copy

xinference list --all

Wrapper for Xinference

You can start a local instance of Xinference by running:

Copy

xinference

You can also deploy Xinference in a distributed cluster. To do so, first start an Xinference supervisor on the server you want to run it:

Copy

xinference-supervisor -H "${supervisor_host}"

Then, start the Xinference workers on each of the other servers where you want to run them on:

Copy

xinference-worker -e "http://${supervisor_host}:9997"

You can also start a local instance of Xinference by running:

Copy

xinference

Once Xinference is running, an endpoint will be accessible for model management via CLI or Xinference client. For local deployment, the endpoint will be http://localhost:9997. For cluster deployment, the endpoint will be http://${supervisor_host}:9997. Then, you need to launch a model. You can specify the model names and other attributes including model_size_in_billions and quantization. You can use command line interface (CLI) to do it. For example,

Copy

xinference launch -n orca -s 3 -q q4_0

A model uid will be returned. Example usage:

Copy

from langchain_community.llms import Xinference

llm = Xinference(
    server_url="http://0.0.0.0:9997",
    model_uid = {model_uid} # replace model_uid with the model UID return from launching the model
)

llm(
    prompt="Q: where can we visit in the capital of France? A:",
    generate_config={"max_tokens": 1024, "stream": True},
)

Usage

For more information and detailed examples, refer to the example for xinference LLMs

Embeddings

Xinference also supports embedding queries and documents. See example for xinference embeddings for a more detailed demo.

Xinference LangChain partner package install

Install the integration package with:

Copy

pip install langchain-xinference

Chat Models

Copy

from langchain_xinference.chat_models import ChatXinference

LLM

Copy

from langchain_xinference.llms import Xinference

Edit this page on GitHub or file an issue.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Popular Providers

Integrations by component

Xorbits Inference (Xinference)

Installation and Setup

LLM

Wrapper for Xinference

Usage

Embeddings

Xinference LangChain partner package install

Chat Models

LLM

Popular Providers

Integrations by component

​Installation and Setup

​LLM

​Wrapper for Xinference

​Usage

​Embeddings

​Xinference LangChain partner package install

​Chat Models

​LLM

Installation and Setup

LLM

Wrapper for Xinference

Usage

Embeddings

Xinference LangChain partner package install

Chat Models

LLM