Loads or creates a SparseEncoder model that can be used to map sentences / text to sparse embeddings.
model_name_or_path (str, optional) – If it is a filepath on disk, it loads the model from that path. If it is not a path, it first tries to download a pre-trained SparseEncoder model. If that fails, tries to construct a model from the Hugging Face Hub with that name.
modules (Iterable[nn.Module], optional) – A list of torch Modules that should be called sequentially, can be used to create custom SparseEncoder models from scratch.
device (str, optional) – Device (like “cuda”, “cpu”, “mps”, “npu”) that should be used for computation. If None, checks if a GPU can be used.
prompts (Dict[str, str], optional) – A dictionary with prompts for the model. The key is the prompt name, the value is the prompt text. The prompt text will be prepended before any text to encode. For example: {“query”: “query: “, “passage”: “passage: “} or {“clustering”: “Identify the main category based on the titles in “}.
default_prompt_name (str, optional) – The name of the prompt that should be used by default. If not set, no prompt will be applied.
similarity_fn_name (str or SimilarityFunction, optional) – The name of the similarity function to use. Valid options are “cosine”, “dot”, “euclidean”, and “manhattan”. If not set, it is automatically set to “cosine” if similarity or similarity_pairwise are called while model.similarity_fn_name is still None.
cache_folder (str, optional) – Path to store models. Can also be set by the SENTENCE_TRANSFORMERS_HOME environment variable.
trust_remote_code (bool, optional) – Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
revision (str, optional) – The specific model version to use. It can be a branch name, a tag name, or a commit id, for a stored model on Hugging Face.
local_files_only (bool, optional) – Whether or not to only look at local files (i.e., do not try to download the model).
token (bool or str, optional) – Hugging Face authentication token to download private models.
max_active_dims (int, optional) – The maximum number of active (non-zero) dimensions in the output of the model. Defaults to None. This means there will be no limit on the number of active dimensions and can be slow or memory-intensive if your model wasn’t (yet) finetuned to high sparsity.
model_kwargs (Dict[str, Any], optional) –
Additional model configuration parameters to be passed to the Hugging Face Transformers model. Particularly useful options are:
torch_dtype: Override the default torch.dtype and load the model under a specific dtype.
The different options are:
1.
torch.float16,torch.bfloat16ortorch.float: load in a specifieddtype, ignoring the model’sconfig.torch_dtypeif one exists. If not specified - the model will get loaded intorch.float(fp32).2.
"auto"- Atorch_dtypeentry in theconfig.jsonfile of the model will be attempted to be used. If this entry isn’t found then next check thedtypeof the first weight in the checkpoint that’s of a floating point type and use that asdtype. This will load the model using thedtypeit was saved in at the end of the training. It can’t be used as an indicator of how the model was trained. Since it could be trained in one of half precision dtypes, but saved in fp32.
attn_implementation: The attention implementation to use in the model (if relevant). Can be any of
“eager” (manual implementation of the attention), “sdpa” (using F.scaled_dot_product_attention),
or “flash_attention_2” (using Dao-AILab/flash-attention).
By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual “eager”
implementation.
provider: If backend is “onnx”, this is the provider to use for inference, for example “CPUExecutionProvider”,
“CUDAExecutionProvider”, etc. See https://onnxruntime.ai/docs/execution-providers/ for all ONNX execution providers.
file_name: If backend is “onnx” or “openvino”, this is the file name to load, useful for loading optimized
or quantized ONNX or OpenVINO models.
export: If backend is “onnx” or “openvino”, then this is a boolean flag specifying whether this model should
be exported to the backend. If not specified, the model will be exported only if the model repository or directory
does not already contain an exported model.
See the PreTrainedModel.from_pretrained documentation for more details.
tokenizer_kwargs (Dict[str, Any], optional) – Additional tokenizer configuration parameters to be passed to the Hugging Face Transformers tokenizer. See the AutoTokenizer.from_pretrained documentation for more details.
config_kwargs (Dict[str, Any], optional) – Additional model configuration parameters to be passed to the Hugging Face Transformers config. See the AutoConfig.from_pretrained documentation for more details.
model_card_data (SparseEncoderModelCardData, optional) – A model
card data object that contains information about the model. This is used to generate a model card when saving
the model. If not set, a default model card data object is created.
backend (str) – The backend to use for inference. Can be one of “torch” (default), “onnx”, or “openvino”. See https://sbert.net/docs/sentence_transformer/usage/efficiency.html for benchmarking information on the different backends.
Example
from sentence_transformers import SparseEncoder
# Load a pre-trained SparseEncoder model
model = SparseEncoder('naver/splade-cocondenser-ensembledistil')
# Encode some texts
sentences = [
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# (3, 30522)
# Get the similarity scores between all sentences
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 35.629, 9.154, 0.098],
# [ 9.154, 27.478, 0.019],
# [ 0.098, 0.019, 29.553]])
Initializes internal Module state, shared by both nn.Module and ScriptModule.
If you are not familiar with adapters and PEFT methods, we invite you to read more about them on the PEFT official documentation: https://huggingface.co/docs/peft
Gets the current active adapters of the model. In case of multi-adapter inference (combining multiple adapters for inference) returns the list of all active adapters so that users can deal with them accordingly.
For previous PEFT versions (that does not support multi-adapter inference), module.active_adapter will return a single string.
Adds a fresh new adapter to the current model for training purposes. If no adapter name is passed, a default name is assigned to the adapter to follow the convention of PEFT library (in PEFT we use “default” as the default adapter name).
Requires peft as a backend to load the adapter weights and the underlying model to be compatible with PEFT.
*args – Positional arguments to pass to the underlying AutoModel add_adapter function. More information can be found in the transformers documentation https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin.add_adapter
**kwargs – Keyword arguments to pass to the underlying AutoModel add_adapter function. More information can be found in the transformers documentation https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin.add_adapter
Casts all floating point parameters and buffers to bfloat16 datatype.
Note
This method modifies the module in-place.
self
Compile this Module’s forward using torch.compile().
This Module’s __call__ method is compiled and all arguments are passed as-is
to torch.compile().
See torch.compile() for details on the arguments for this function.
Moves all model parameters and buffers to the CPU.
Note
This method modifies the module in-place.
self
Moves all model parameters and buffers to the GPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.
Note
This method modifies the module in-place.
device (int, optional) – if specified, all parameters will be copied to that device
self
Decode top K tokens and weights from a sparse embedding. If none will just return the all tokens and weights
embeddings (torch.Tensor) – Sparse embedding tensor (batch, vocab) or (vocab).
top_k (int, optional) – Number of top tokens to return per sample. If None, returns all non-zero tokens.
List of tuples (token, weight) for each embedding. If batch input, returns a list of lists of tuples.
list[tuple[str, float]] | list[list[tuple[str, float]]]
If you are not familiar with adapters and PEFT methods, we invite you to read more about them on the PEFT official documentation: https://huggingface.co/docs/peft
Delete an adapter’s LoRA layers from the underlying model.
*args – Positional arguments to pass to the underlying AutoModel delete_adapter function. More information can be found in the transformers documentation https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin.delete_adapter
**kwargs – Keyword arguments to pass to the underlying AutoModel delete_adapter function. More information can be found in the transformers documentation https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin.delete_adapter
Get torch.device from module, assuming that the whole module has one device. In case there are no PyTorch parameters, fall back to CPU.
Disable all adapters that are attached to the model. This leads to inferring with the base model only.
Casts all floating point parameters and buffers to double datatype.
Note
This method modifies the module in-place.
self
Enable adapters that are attached to the model. The model will use self.active_adapter()
Computes sparse sentence embeddings.
Tip
If you are unsure whether you should use encode(), encode_query(), or encode_document(),
your best bet is to use encode_query() and encode_document() for Information Retrieval tasks
with clear query and document/passage distinction, and use encode() for all other tasks.
Note that encode() is the most general method and can be used for any task, including Information
Retrieval, and that if the model was not trained with predefined prompts and/or task types, then all three
methods will return identical embeddings.
sentences (Union[str, List[str]]) – The sentences to embed.
prompt_name (Optional[str], optional) – The name of the prompt to use for encoding. Must be a key in the prompts dictionary,
which is either set in the constructor or loaded from the model configuration. For example if
prompt_name is “query” and the prompts is {“query”: “query: “, …}, then the sentence “What
is the capital of France?” will be encoded as “query: What is the capital of France?” because the sentence
is appended to the prompt. If prompt is also set, this argument is ignored. Defaults to None.
prompt (Optional[str], optional) – The prompt to use for encoding. For example, if the prompt is “query: “, then the
sentence “What is the capital of France?” will be encoded as “query: What is the capital of France?”
because the sentence is appended to the prompt. If prompt is set, prompt_name is ignored. Defaults to None.
batch_size (int, optional) – The batch size used for the computation. Defaults to 32.
show_progress_bar (bool, optional) – Whether to output a progress bar when encode sentences. Defaults to None.
convert_to_tensor (bool, optional) – Whether the output should be a single stacked tensor (True) or a list of individual tensors (False). Sparse tensors may be challenging to slice, so this allows you to output lists of tensors instead. Defaults to True.
convert_to_sparse_tensor (bool, optional) – Whether the output should be in the format of a sparse (COO) tensor. Defaults to True.
save_to_cpu (bool, optional) – Whether the output should be moved to cpu or stay on the device it has been computed on. Defaults to False
device (Union[str, List[str], None], optional) –
Device(s) to use for computation. Can be:
A single device string (e.g., “cuda:0”, “cpu”) for single-process encoding
A list of device strings (e.g., [“cuda:0”, “cuda:1”], [“cpu”, “cpu”, “cpu”, “cpu”]) to distribute encoding across multiple processes
None to auto-detect available device for single-process encoding
If a list is provided, multi-process encoding will be used. Defaults to None.
max_active_dims (int, optional) – The maximum number of active (non-zero) dimensions in the output of the model. None means we will used the value of the model’s config. Defaults to None. If None in model’s config it means there will be no limit on the number of active dimensions and can be slow or memory-intensive if your model wasn’t (yet) finetuned to high sparsity.
pool (Dict[Literal["input", "output", "processes"], Any], optional) – A pool created by start_multi_process_pool() for multi-process encoding. If provided, the encoding will be distributed across multiple processes. This is recommended for large datasets and when multiple GPUs are available. Defaults to None.
chunk_size (int, optional) – Size of chunks for multi-process encoding. Only used with multiprocessing, i.e. when
pool is not None or device is a list. If None, a sensible default is calculated. Defaults to None.
By default, a 2d torch sparse tensor with shape [num_inputs, output_dimension] is returned. If only one string input is provided, then the output is a 1d array with shape [output_dimension]. If save_to_cpu is True, the embeddings are moved to the CPU.
Union[List[Tensor], ndarray, Tensor]
Example
from sentence_transformers import SparseEncoder
# Load a pre-trained SparseEncoder model
model = SparseEncoder("naver/splade-cocondenser-ensembledistil")
# Encode some texts
sentences = [
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# (3, 30522)
Computes sentence embeddings specifically optimized for document/passage representation.
This method is a specialized version of encode() that differs in exactly two ways:
If no prompt_name or prompt is provided, it uses a predefined “document” prompt,
if available in the model’s prompts dictionary.
It sets the task to “document”. If the model has a Router
module, it will use the “document” task type to route the input through the appropriate submodules.
Tip
If you are unsure whether you should use encode(), encode_query(), or encode_document(),
your best bet is to use encode_query() and encode_document() for Information Retrieval tasks
with clear query and document/passage distinction, and use encode() for all other tasks.
Note that encode() is the most general method and can be used for any task, including Information
Retrieval, and that if the model was not trained with predefined prompts and/or task types, then all three
methods will return identical embeddings.
sentences (Union[str, List[str]]) – The sentences to embed.
prompt_name (Optional[str], optional) – The name of the prompt to use for encoding. Must be a key in the prompts dictionary,
which is either set in the constructor or loaded from the model configuration. For example if
prompt_name is “query” and the prompts is {“query”: “query: “, …}, then the sentence “What
is the capital of France?” will be encoded as “query: What is the capital of France?” because the sentence
is appended to the prompt. If prompt is also set, this argument is ignored. Defaults to None.
prompt (Optional[str], optional) – The prompt to use for encoding. For example, if the prompt is “query: “, then the
sentence “What is the capital of France?” will be encoded as “query: What is the capital of France?”
because the sentence is appended to the prompt. If prompt is set, prompt_name is ignored. Defaults to None.
batch_size (int, optional) – The batch size used for the computation. Defaults to 32.
show_progress_bar (bool, optional) – Whether to output a progress bar when encode sentences. Defaults to None.
convert_to_tensor (bool, optional) – Whether the output should be a single stacked tensor (True) or a list of individual tensors (False). Sparse tensors may be challenging to slice, so this allows you to output lists of tensors instead. Defaults to True.
convert_to_sparse_tensor (bool, optional) – Whether the output should be in the format of a sparse (COO) tensor. Defaults to True.
save_to_cpu (bool, optional) – Whether the output should be moved to cpu or stay on the device it has been computed on. Defaults to False
device (Union[str, List[str], None], optional) –
Device(s) to use for computation. Can be:
A single device string (e.g., “cuda:0”, “cpu”) for single-process encoding
A list of device strings (e.g., [“cuda:0”, “cuda:1”], [“cpu”, “cpu”, “cpu”, “cpu”]) to distribute encoding across multiple processes
None to auto-detect available device for single-process encoding
If a list is provided, multi-process encoding will be used. Defaults to None.
max_active_dims (int, optional) – The maximum number of active (non-zero) dimensions in the output of the model. None means we will used the value of the model’s config. Defaults to None. If None in model’s config it means there will be no limit on the number of active dimensions and can be slow or memory-intensive if your model wasn’t (yet) finetuned to high sparsity.
pool (Dict[Literal["input", "output", "processes"], Any], optional) – A pool created by start_multi_process_pool() for multi-process encoding. If provided, the encoding will be distributed across multiple processes. This is recommended for large datasets and when multiple GPUs are available. Defaults to None.
chunk_size (int, optional) – Size of chunks for multi-process encoding. Only used with multiprocessing, i.e. when
pool is not None or device is a list. If None, a sensible default is calculated. Defaults to None.
By default, a 2d torch sparse tensor with shape [num_inputs, output_dimension] is returned. If only one string input is provided, then the output is a 1d array with shape [output_dimension]. If save_to_cpu is True, the embeddings are moved to the CPU.
Union[List[Tensor], ndarray, Tensor]
Example
from sentence_transformers import SparseEncoder
# Load a pre-trained SparseEncoder model
model = SparseEncoder("naver/splade-cocondenser-ensembledistil")
# Encode some texts
sentences = [
"This research paper discusses the effects of climate change on marine life.",
"The article explores the history of artificial intelligence development.",
"This document contains technical specifications for the new product line.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# (3, 30522)
Computes sentence embeddings specifically optimized for query representation.
This method is a specialized version of encode() that differs in exactly two ways:
If no prompt_name or prompt is provided, it uses a predefined “query” prompt,
if available in the model’s prompts dictionary.
It sets the task to “query”. If the model has a Router
module, it will use the “query” task type to route the input through the appropriate submodules.
Tip
If you are unsure whether you should use encode(), encode_query(), or encode_document(),
your best bet is to use encode_query() and encode_document() for Information Retrieval tasks
with clear query and document/passage distinction, and use encode() for all other tasks.
Note that encode() is the most general method and can be used for any task, including Information
Retrieval, and that if the model was not trained with predefined prompts and/or task types, then all three
methods will return identical embeddings.
sentences (Union[str, List[str]]) – The sentences to embed.
prompt_name (Optional[str], optional) – The name of the prompt to use for encoding. Must be a key in the prompts dictionary,
which is either set in the constructor or loaded from the model configuration. For example if
prompt_name is “query” and the prompts is {“query”: “query: “, …}, then the sentence “What
is the capital of France?” will be encoded as “query: What is the capital of France?” because the sentence
is appended to the prompt. If prompt is also set, this argument is ignored. Defaults to None.
prompt (Optional[str], optional) – The prompt to use for encoding. For example, if the prompt is “query: “, then the
sentence “What is the capital of France?” will be encoded as “query: What is the capital of France?”
because the sentence is appended to the prompt. If prompt is set, prompt_name is ignored. Defaults to None.
batch_size (int, optional) – The batch size used for the computation. Defaults to 32.
show_progress_bar (bool, optional) – Whether to output a progress bar when encode sentences. Defaults to None.
convert_to_tensor (bool, optional) – Whether the output should be a single stacked tensor (True) or a list of individual tensors (False). Sparse tensors may be challenging to slice, so this allows you to output lists of tensors instead. Defaults to True.
convert_to_sparse_tensor (bool, optional) – Whether the output should be in the format of a sparse (COO) tensor. Defaults to True.
save_to_cpu (bool, optional) – Whether the output should be moved to cpu or stay on the device it has been computed on. Defaults to False
device (Union[str, List[str], None], optional) –
Device(s) to use for computation. Can be:
A single device string (e.g., “cuda:0”, “cpu”) for single-process encoding
A list of device strings (e.g., [“cuda:0”, “cuda:1”], [“cpu”, “cpu”, “cpu”, “cpu”]) to distribute encoding across multiple processes
None to auto-detect available device for single-process encoding
If a list is provided, multi-process encoding will be used. Defaults to None.
max_active_dims (int, optional) – The maximum number of active (non-zero) dimensions in the output of the model. None means we will used the value of the model’s config. Defaults to None. If None in model’s config it means there will be no limit on the number of active dimensions and can be slow or memory-intensive if your model wasn’t (yet) finetuned to high sparsity.
pool (Dict[Literal["input", "output", "processes"], Any], optional) – A pool created by start_multi_process_pool() for multi-process encoding. If provided, the encoding will be distributed across multiple processes. This is recommended for large datasets and when multiple GPUs are available. Defaults to None.
chunk_size (int, optional) – Size of chunks for multi-process encoding. Only used with multiprocessing, i.e. when
pool is not None or device is a list. If None, a sensible default is calculated. Defaults to None.
By default, a 2d torch sparse tensor with shape [num_inputs, output_dimension] is returned. If only one string input is provided, then the output is a 1d array with shape [output_dimension]. If save_to_cpu is True, the embeddings are moved to the CPU.
Union[List[Tensor], ndarray, Tensor]
Example
from sentence_transformers import SparseEncoder
# Load a pre-trained SparseEncoder model
model = SparseEncoder("naver/splade-cocondenser-ensembledistil")
# Encode some texts
queries = [
"What are the effects of climate change?",
"History of artificial intelligence",
"Technical specifications product XYZ",
]
embeddings = model.encode_query(queries)
print(embeddings.shape)
# (3, 30522)
Sets the module in evaluation mode.
This has any effect only on certain modules. See documentations of
particular modules for details of their behaviors in training/evaluation
mode, if they are affected, e.g. Dropout, BatchNorm,
etc.
This is equivalent with self.train(False).
See Locally disabling gradient computation for a comparison between .eval() and several similar mechanisms that may be confused with it.
self
Evaluate the model based on an evaluator
evaluator (SentenceEvaluator) – The evaluator used to evaluate the model.
output_path (str, optional) – The path where the evaluator can write the results. Defaults to None.
The evaluation results.
Casts all floating point parameters and buffers to float datatype.
Note
This method modifies the module in-place.
self
If you are not familiar with adapters and PEFT methods, we invite you to read more about them on the PEFT official documentation: https://huggingface.co/docs/peft
Gets the adapter state dict that should only contain the weights tensors of the specified adapter_name adapter. If no adapter_name is passed, the active adapter is used.
*args – Positional arguments to pass to the underlying AutoModel get_adapter_state_dict function. More information can be found in the transformers documentation https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin.get_adapter_state_dict
**kwargs – Keyword arguments to pass to the underlying AutoModel get_adapter_state_dict function. More information can be found in the transformers documentation https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin.get_adapter_state_dict
Return the backend used for inference, which can be one of “torch”, “onnx”, or “openvino”.
The backend used for inference.
str
Returns the maximal sequence length that the model accepts. Longer inputs will be truncated.
The maximal sequence length that the model accepts, or None if it is not defined.
Optional[int]
Get the keyword arguments specific to this model for the encode, encode_query, or encode_document methods.
Example
>>> from sentence_transformers import SentenceTransformer, SparseEncoder
>>> SentenceTransformer("all-MiniLM-L6-v2").get_model_kwargs()
[]
>>> SentenceTransformer("jinaai/jina-embeddings-v4", trust_remote_code=True).get_model_kwargs()
['task', 'truncate_dim']
>>> SparseEncoder("opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill").get_model_kwargs()
['task']
A list of keyword arguments for the forward pass.
list[str]
Returns the number of dimensions in the output of SparseEncoder.encode.
We override the function without updating regarding the truncate dim as for sparse model the dimension of the output
is the same, only the active dimensions number changes.
The number of dimensions in the output of encode. If it’s not known, it’s None.
Optional[int]
Casts all floating point parameters and buffers to half datatype.
Note
This method modifies the module in-place.
self
Compute the intersection of two sparse embeddings.
embeddings_1 (torch.Tensor) – First embedding tensor, (vocab).
embeddings_2 (torch.Tensor) – Second embedding tensor, (vocab) or (batch_size, vocab).
Intersection of the two embeddings.
Load adapter weights from file or remote Hub folder.” If you are not familiar with adapters and PEFT methods, we invite you to read more about them on PEFT official documentation: https://huggingface.co/docs/peft
Requires peft as a backend to load the adapter weights and the underlying model to be compatible with PEFT.
*args – Positional arguments to pass to the underlying AutoModel load_adapter function. More information can be found in the transformers documentation https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin.load_adapter
**kwargs – Keyword arguments to pass to the underlying AutoModel load_adapter function. More information can be found in the transformers documentation https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin.load_adapter
Returns the maximal input sequence length for the model. Longer inputs will be truncated.
The maximal input sequence length.
int
Example
from sentence_transformers import SparseEncoder
model = SparseEncoder("naver/splade-cocondenser-ensembledistil")
print(model.max_seq_length)
# => 512
alias of SparseEncoderModelCardData
Uploads all elements of this Sparse Encoder to a new HuggingFace Hub repository.
repo_id (str) – Repository name for your model in the Hub, including the user or organization.
token (str, optional) – An authentication token (See https://huggingface.co/settings/token)
private (bool, optional) – Set to true, for hosting a private model
safe_serialization (bool, optional) – If true, save the model using safetensors. If false, save the model the traditional PyTorch way
commit_message (str, optional) – Message to commit while pushing.
local_model_path (str, optional) – Path of the model locally. If set, this file path will be uploaded. Otherwise, the current model will be uploaded
exist_ok (bool, optional) – If true, saving to an existing repository is OK. If false, saving only to a new repository is possible
replace_model_card (bool, optional) – If true, replace an existing model card in the hub with the automatically created model card
train_datasets (List[str], optional) – Datasets used to train the model. If set, the datasets will be added to the model card in the Hub.
revision (str, optional) – Branch to push the uploaded files to
create_pr (bool, optional) – If True, create a pull request instead of pushing directly to the main branch
The url of the commit of your model in the repository on the Hugging Face Hub.
str
Saves a model and its configuration files to a directory, so that it can be loaded
with SparseEncoder(path) again.
path (str) – Path on disk where the model will be saved.
model_name (str, optional) – Optional model name.
create_model_card (bool, optional) – If True, create a README.md with basic information about this model.
train_datasets (List[str], optional) – Optional list with the names of the datasets used to train the model.
safe_serialization (bool, optional) – If True, save the model using safetensors. If False, save the model the traditional (but unsafe) PyTorch way.
Sets a specific adapter by forcing the model to use a that adapter and disable the other adapters.
*args – Positional arguments to pass to the underlying AutoModel set_adapter function. More information can be found in the transformers documentation https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin.set_adapter
**kwargs – Keyword arguments to pass to the underlying AutoModel set_adapter function. More information can be found in the transformers documentation https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin.set_adapter
Sets the include_prompt attribute in the pooling layer in the model, if there is one.
This is useful for INSTRUCTOR models, as the prompt should be excluded from the pooling strategy for these models.
include_prompt (bool) – Whether to include the prompt in the pooling layer.
None
Compute the similarity between two collections of embeddings. The output will be a matrix with the similarity scores between all embeddings from the first parameter and all embeddings from the second parameter. This differs from similarity_pairwise which computes the similarity between each pair of embeddings. This method supports only embeddings with fp32 precision and does not accommodate quantized embeddings.
embeddings1 (Union[Tensor, ndarray]) – [num_embeddings_1, embedding_dim] or [embedding_dim]-shaped numpy array or torch tensor.
embeddings2 (Union[Tensor, ndarray]) – [num_embeddings_2, embedding_dim] or [embedding_dim]-shaped numpy array or torch tensor.
A [num_embeddings_1, num_embeddings_2]-shaped torch tensor with similarity scores.
Tensor
Example
>>> model = SparseEncoder("naver/splade-cocondenser-ensembledistil")
>>> sentences = [
... "The weather is so nice!",
... "It's so sunny outside.",
... "He's driving to the movie theater.",
... "She's going to the cinema.",
... ]
>>> embeddings = model.encode(sentences, normalize_embeddings=True)
>>> model.similarity(embeddings, embeddings)
tensor([[ 30.953, 12.871, 0.000, 0.011],
[ 12.871, 27.505, 0.580, 0.578],
[ 0.000, 0.580, 36.068, 15.301],
[ 0.011, 0.578, 15.301, 39.466]])
>>> model.similarity_fn_name
"dot"
>>> model.similarity_fn_name = "cosine"
>>> model.similarity(embeddings, embeddings)
tensor([[ 1.000, 0.441, 0.000, 0.000],
[ 0.441, 1.000, 0.018, 0.018],
[ 0.000, 0.018, 1.000, 0.406],
[ 0.000, 0.018, 0.406, 1.000]])
Return the name of the similarity function used by SparseEncoder.similarity() and SparseEncoder.similarity_pairwise().
default to “cosine” when first called.
Optional[str]
Example
>>> model = SparseEncoder("naver/splade-cocondenser-ensembledistil")
>>> model.similarity_fn_name
'dot'
Compute the similarity between two collections of embeddings. The output will be a vector with the similarity scores between each pair of embeddings. This method supports only embeddings with fp32 precision and does not accommodate quantized embeddings.
embeddings1 (Union[Tensor, ndarray]) – [num_embeddings, embedding_dim] or [embedding_dim]-shaped numpy array or torch tensor.
embeddings2 (Union[Tensor, ndarray]) – [num_embeddings, embedding_dim] or [embedding_dim]-shaped numpy array or torch tensor.
A [num_embeddings]-shaped torch tensor with pairwise similarity scores.
Tensor
Example
>>> model = SparseEncoder("naver/splade-cocondenser-ensembledistil")
>>> sentences = [
... "The weather is so nice!",
... "It's so sunny outside.",
... "He's driving to the movie theater.",
... "She's going to the cinema.",
... ]
>>> embeddings = model.encode(sentences, convert_to_sparse_tensor=False)
>>> model.similarity_pairwise(embeddings[::2], embeddings[1::2])
tensor([12.871, 15.301])
>>> model.similarity_fn_name
"dot"
>>> model.similarity_fn_name = "cosine"
>>> model.similarity_pairwise(embeddings[::2], embeddings[1::2])
tensor([0.441, 0.406])
Transforms a batch from a SmartBatchingDataset to a batch of tensors for the model Here, batch is a list of InputExample instances: [InputExample(…), …]
batch – a batch from a SmartBatchingDataset
a batch of tensors for the model
Calculate sparsity statistics for the given embeddings, including the mean number of active dimensions and the mean sparsity ratio.
embeddings (torch.Tensor) – The embeddings to analyze.
Dictionary with the mean active dimensions and mean sparsity ratio.
dict[str, float]
from sentence_transformers import SparseEncoder
model = SparseEncoder("naver/splade-cocondenser-ensembledistil")
embeddings = model.encode(["The weather is so nice!", "It's so sunny outside."])
stats = model.sparsity(embeddings)
print(stats)
# => {'active_dims': 44.0, 'sparsity_ratio': 0.9985584020614624}
Returns the chunk size of the SpladePooling module, if present. This Chunk size is along the sequence length dimension (i.e., number of tokens per chunk). If None, processes entire sequence at once. Using smaller chunks the reduces memory usage but may lower the training and inference speed. Default is None.
The chunk size, or None if SpladePooling is not found or chunk_size is not set.
Optional[int]
Starts a multi-process pool to process the encoding with several independent processes
via SentenceTransformer.encode_multi_process.
This method is recommended if you want to encode on multiple GPUs or CPUs. It is advised to start only one process per GPU. This method works together with encode_multi_process and stop_multi_process_pool.
target_devices (List[str], optional) – PyTorch target devices, e.g. [“cuda:0”, “cuda:1”, …], [“npu:0”, “npu:1”, …], or [“cpu”, “cpu”, “cpu”, “cpu”]. If target_devices is None and CUDA/NPU is available, then all available CUDA/NPU devices will be used. If target_devices is None and CUDA/NPU is not available, then 4 CPU devices will be used.
A dictionary with the target processes, an input queue, and an output queue.
Dict[str, Any]
Stops all processes started with start_multi_process_pool.
pool (Dict[str, object]) – A dictionary containing the input queue, output queue, and process list.
None
Moves and/or casts the parameters and buffers.
This can be called as
Its signature is similar to torch.Tensor.to(), but only accepts
floating point or complex dtypes. In addition, this method will
only cast the floating point or complex parameters and buffers to dtype
(if given). The integral parameters and buffers will be moved
device, if that is given, but with dtypes unchanged. When
non_blocking is set, it tries to convert/move asynchronously
with respect to the host if possible, e.g., moving CPU Tensors with
pinned memory to CUDA devices.
See below for examples.
Note
This method modifies the module in-place.
device (torch.device) – the desired device of the parameters
and buffers in this module
dtype (torch.dtype) – the desired floating point or complex dtype of
the parameters and buffers in this module
tensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module
memory_format (torch.memory_format) – the desired memory
format for 4D parameters and buffers in this module (keyword
only argument)
self
Examples:
>>> # xdoctest: +IGNORE_WANT("non-deterministic")
>>> linear = nn.Linear(2, 2)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
[-0.5113, -0.2325]])
>>> linear.to(torch.double)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
[-0.5113, -0.2325]], dtype=torch.float64)
>>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_CUDA1)
>>> gpu1 = torch.device("cuda:1")
>>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
[-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
>>> cpu = torch.device("cpu")
>>> linear.to(cpu)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
[-0.5112, -0.2324]], dtype=torch.float16)
>>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble)
>>> linear.weight
Parameter containing:
tensor([[ 0.3741+0.j, 0.2382+0.j],
[ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128)
>>> linear(torch.ones(3, 2, dtype=torch.cdouble))
tensor([[0.6122+0.j, 0.1150+0.j],
[0.6122+0.j, 0.1150+0.j],
[0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
Tokenizes the texts.
texts (Union[List[str], List[Dict], List[Tuple[str, str]]]) – A list of texts to be tokenized.
”attention_mask”, and “token_type_ids”.
Dict[str, Tensor]
Property to get the tokenizer that is used by this model
Sets the module in training mode.
This has any effect only on certain modules. See documentations of
particular modules for details of their behaviors in training/evaluation
mode, if they are affected, e.g. Dropout, BatchNorm,
etc.
mode (bool) – whether to set training mode (True) or evaluation
mode (False). Default: True.
self
Property to get the underlying transformers PreTrainedModel instance, if it exists. Note that it’s possible for a model to have multiple underlying transformers models, but this property will return the first one it finds in the module hierarchy.
The underlying transformers model or None if not found.
PreTrainedModel or None
Example
from sentence_transformers import SparseEncoder
model = SparseEncoder("naver/splade-v3")
# You can now access the underlying transformers model
transformers_model = model.transformers_model
print(type(transformers_model))
# => <class 'transformers.models.bert.modeling_bert.BertForMaskedLM'>
A dataclass storing data used in the model card.
language (Optional[Union[str, List[str]]]) – The model language, either a string or a list, e.g. “en” or [“en”, “de”, “nl”]
license (Optional[str]) – The license of the model, e.g. “apache-2.0”, “mit”, or “cc-by-nc-sa-4.0”
model_name (Optional[str]) – The pretty name of the model, e.g. “SparseEncoder based on answerdotai/ModernBERT-base”.
model_id (Optional[str]) – The model ID when pushing the model to the Hub, e.g. “tomaarsen/se-mpnet-base-ms-marco”.
train_datasets (List[Dict[str, str]]) – A list of the names and/or Hugging Face dataset IDs of the training datasets. e.g. [{“name”: “SNLI”, “id”: “stanfordnlp/snli”}, {“name”: “MultiNLI”, “id”: “nyu-mll/multi_nli”}, {“name”: “STSB”}]
eval_datasets (List[Dict[str, str]]) – A list of the names and/or Hugging Face dataset IDs of the evaluation datasets. e.g. [{“name”: “SNLI”, “id”: “stanfordnlp/snli”}, {“id”: “mteb/stsbenchmark-sts”}]
task_name (str) – The human-readable task the model is trained on, e.g. “semantic search and sparse retrieval”.
tags (Optional[List[str]]) – A list of tags for the model, e.g. [“sentence-transformers”, “sparse-encoder”].
local_files_only (bool) – If True, don’t attempt to find dataset or base model information on the Hub.Add commentMore actions Defaults to False.
generate_widget_examples (bool) – If True, generate widget examples from the evaluation or training dataset, and compute their similarities. Defaults to True.
Tip
Install codecarbon to automatically track carbon emission usage and include it in your model cards.
Example:
>>> model = SparseEncoder(
... "microsoft/mpnet-base",
... model_card_data=SparseEncoderModelCardData(
... model_id="tomaarsen/se-mpnet-base-allnli",
... train_datasets=[{"name": "SNLI", "id": "stanfordnlp/snli"}, {"name": "MultiNLI", "id": "nyu-mll/multi_nli"}],
... eval_datasets=[{"name": "SNLI", "id": "stanfordnlp/snli"}, {"name": "MultiNLI", "id": "nyu-mll/multi_nli"}],
... license="apache-2.0",
... language="en",
... ),
... )
Enum class for supported similarity functions. The following functions are supported:
SimilarityFunction.COSINE ("cosine"): Cosine similarity
SimilarityFunction.DOT_PRODUCT ("dot", dot_product): Dot product similarity
SimilarityFunction.EUCLIDEAN ("euclidean"): Euclidean distance
SimilarityFunction.MANHATTAN ("manhattan"): Manhattan distance
Returns a list of possible values for the SimilarityFunction enum.
A list of possible values for the SimilarityFunction enum.
list
Example
>>> possible_values = SimilarityFunction.possible_values()
>>> possible_values
['cosine', 'dot', 'euclidean', 'manhattan']
Converts a similarity function name or enum value to the corresponding similarity function.
similarity_function (Union[str, SimilarityFunction]) – The name or enum value of the similarity function.
The corresponding similarity function.
Callable[[Union[Tensor, ndarray], Union[Tensor, ndarray]], Tensor]
ValueError – If the provided function is not supported.
Example
>>> similarity_fn = SimilarityFunction.to_similarity_fn("cosine")
>>> similarity_scores = similarity_fn(embeddings1, embeddings2)
>>> similarity_scores
tensor([[0.3952, 0.0554],
[0.0992, 0.1570]])
Converts a similarity function into a pairwise similarity function.
The pairwise similarity function returns the diagonal vector from the similarity matrix, i.e. it only computes the similarity(a[i], b[i]) for each i in the range of the input tensors, rather than computing the similarity between all pairs of a and b.
similarity_function (Union[str, SimilarityFunction]) – The name or enum value of the similarity function.
The pairwise similarity function.
Callable[[Union[Tensor, ndarray], Union[Tensor, ndarray]], Tensor]
ValueError – If the provided similarity function is not supported.
Example
>>> pairwise_fn = SimilarityFunction.to_similarity_pairwise_fn("cosine")
>>> similarity_scores = pairwise_fn(embeddings1, embeddings2)
>>> similarity_scores
tensor([0.3952, 0.1570])