Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Add eos/last_token pooling #1335

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
Loading
from
Open

Add eos/last_token pooling #1335

wants to merge 1 commit into from

Conversation

xenova
Copy link
Collaborator

@xenova xenova commented Jun 6, 2025

Adds support for last_token pooling, which is used by the new qwen3 embedding models. cc @tomaarsen

Example usage:

import { pipeline, matmul } from "@huggingface/transformers";

// Create a feature extraction pipeline
const extractor = await pipeline("feature-extraction", "onnx-community/Qwen3-Embedding-0.6B-ONNX", {
    dtype: "fp32", // Options: "fp32", "fp16", "q8"
    // device: "webgpu",
});


function get_detailed_instruct(task_description, query) {
    return `Instruct: ${task_description}\nQuery:${query}`;
}

// Each query must come with a one-sentence instruction that describes the task
const task = 'Given a web search query, retrieve relevant passages that answer the query'

const queries = [
    get_detailed_instruct(task, 'What is the capital of China?'),
    get_detailed_instruct(task, 'Explain gravity')
]
// No need to add instruction for retrieval documents
const documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
]
const input_texts = [...queries, ...documents]

const output = await extractor(input_texts, { pooling: "last_token", normalize: true });
const scores = await matmul(
    output.slice([0, queries.length]), // Query embeddings
    output.slice([queries.length, null]).transpose(1, 0), // Document embeddings
)
console.log(scores.tolist());
// [
//   [ 0.7645590305328369, 0.14142560958862305 ],
//   [ 0.13549776375293732, 0.599955141544342 ]
// ]

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.