Solvers for the original linear program OT problem.
Glue transport plans \((\pi_1, ..., \pi_K)\) which have a common first marginal using the (multi-marginal) North-West Corner method. Writing the marginals of each \(\pi_k\in \mathbb{R}^{n\times n_l}\) as \(a \in \mathbb{R}^n\) and \(b_k \in \mathbb{R}^{n_k}\), the output represents a particular K-marginal transport plan \(\rho \in \mathbb{R}^{n_1\times\cdots\times n_K}\) whose k-th marginal is \(b_k\). This K-plan is such that there exists a K+1-marginal transport plan \(\gamma \in \mathbb{R}^{n\times n_1 \times \cdots \times n_K}\) such that \(\sum_i\gamma_{i,j_1,\cdots,j_K} = \rho_{j_1, \cdots, j_K}\) and with Einstein summation convention, \(\gamma_{i, j_1, \cdots, j_K} = [\pi_k]_{i, j_k}\) for all \(k=1,\cdots,K\).
Instead of outputting the full K-multi-marginal plan \(\rho\), this function provides an array J of shape (N, K) where each J[i] is of the form (J[i, 1], …, J[i, K]) with each J[i, k] between 0 and \(n_k-1\), and a weight vector w of size N, such that the K-plan \(\rho\) writes:
This representation is useful for its memory efficiency, as it avoids storing the full K-marginal plan.
If log=True, the function computes the full K+1-marginal transport plan \(\gamma\) and stores it in log_dict[‘gamma’]. Note that this option is extremely costly in memory.
pi_list (list of arrays (n, n_k)) – List of transport plans.
log (bool, optional) – If True, return a log dictionary (computationally expensive).
a (array (n,), optional) – The common first marginal of each transport plan. If None is provided, it is computed as the first marginal of the first transport plan.
nx (backend, optional) – The backend to use. If None is provided, the backend of the first transport plan is used.
J (array (N, K)) – The indices (J[i, 1], …, J[i, K]) of the K-plan \(\rho\).
w (array (N,)) – The weights w_i of the K-plan \(\rho\).
log_dict (dict, optional) – If log=True, a dictionary containing the full K+1-marginal transport plan under the key ‘gamma’.
Compute the Wasserstein barycenter of distributions A
The function solves the following optimization problem [16]:
where :
\(W_1(\cdot,\cdot)\) is the Wasserstein distance (see ot.emd.sinkhorn)
\(\mathbf{a}_i\) are training distributions in the columns of matrix \(\mathbf{A}\)
The linear program is solved using the interior point solver from scipy.optimize. If cvxopt solver if installed it can use cvxopt
Note that this problem do not scale well (both in memory and computational time).
A (np.ndarray (d,n)) – n training distributions a_i of size d
M (np.ndarray (d,d)) – loss matrix for OT
reg (float) – Regularization term >0
weights (np.ndarray (n,)) – Weights of each histogram a_i on the simplex (barycentric coordinates)
verbose (bool, optional) – Print information along iterations
log (bool, optional) – record log if True
solver (string, optional) – the solver used, default ‘interior-point’ use the lp solver from scipy.optimize. None, or ‘glpk’ or ‘mosek’ use the solver from cvxopt.
a ((d,) ndarray) – Wasserstein barycenter
log (dict) – log dictionary return only if log==True in parameters
References
Computes the Wasserstein distance on the circle using the Binary search algorithm proposed in [44]. Samples need to be in \(S^1\cong [0,1[\). If they are on \(\mathbb{R}\), takes the value modulo 1. If the values are on \(S^1\subset\mathbb{R}^2\), it is required to first find the coordinates using e.g. the atan2 function.
where:
\(F_u\) and \(F_v\) are respectively the cdfs of \(u\) and \(v\)
For values \(x=(x_1,x_2)\in S^1\), it is required to first get their coordinates with
using e.g. ot.utils.get_coordinate_circle(x)
The function runs on backend but tensorflow and jax are not supported.
u_values (ndarray, shape (n, ...)) – samples in the source domain (coordinates on [0,1[)
v_values (ndarray, shape (n, ...)) – samples in the target domain (coordinates on [0,1[)
u_weights (ndarray, shape (n, ...), optional) – samples weights in the source domain
v_weights (ndarray, shape (n, ...), optional) – samples weights in the target domain
p (float, optional (default=1)) – Power p used for computing the Wasserstein distance
Lm (int, optional) – Lower bound dC
Lp (int, optional) – Upper bound dC
tm (float, optional) – Lower bound theta
tp (float, optional) – Upper bound theta
eps (float, optional) – Stopping condition
require_sort (bool, optional) – If True, sort the values.
log (bool, optional) – If True, returns also the optimal theta
loss (float/array-like, shape (…)) – Batched cost associated to the optimal transportation
log (dict, optional) – log dictionary returned only if log==True in parameters
Examples
>>> u = np.array([[0.2,0.5,0.8]])%1
>>> v = np.array([[0.4,0.5,0.7]])%1
>>> binary_search_circle(u.T, v.T, p=1)
array([0.1])
References
Delon, Julie, Julien Salomon, and Andrei Sobolevski. “Fast transport optimization for Monge costs on the circle.” SIAM Journal on Applied Mathematics 70.7 (2010): 2239-2258.
Checks whether or not the requested number of threads has a valid value.
Compute the discrete multi-marginal optimal transport of distributions A.
This function operates on distributions whose supports are real numbers on the real line.
The algorithm solves both primal and dual d-MMOT programs concurrently to produce the optimal transport plan as well as the total (minimal) cost. The cost is a ground cost, and the solution is independent of which Monge cost is desired.
The algorithm accepts \(d\) distributions (i.e., histograms) \(a_{1}, \ldots, a_{d} \in \mathbb{R}_{+}^{n}\) with \(e^{\prime} a_{j}=1\) for all \(j \in[d]\). Although the algorithm states that all histograms have the same number of bins, the algorithm can be easily adapted to accept as inputs \(a_{i} \in \mathbb{R}_{+}^{n_{i}}\) with \(n_{i} \neq n_{j}\) [50].
The function solves the following optimization problem[51]:
obj (float) – the value of the primal objective function evaluated at the solution.
log (dict) – A dictionary containing the log of the discrete mmot problem: - ‘A’: a dictionary that maps tuples of indices to the corresponding primal variables. The tuples are the indices of the entries that are set to their minimum value during the algorithm. - ‘primal objective’: a float, the value of the objective function evaluated at the solution. - ‘dual’: a list of arrays, the dual variables corresponding to the input arrays. The i-th element of the list is the dual variable corresponding to the i-th dimension of the input arrays. - ‘dual objective’: a float, the value of the dual objective function evaluated at the solution.
References
Ronak Mehta, Jeffery Kline, Vishnu Suresh Lokhande, Glenn Fung, & Vikas Singh (2023). Efficient Discrete Multi Marginal Optimal Transport Regularization. In The Eleventh International Conference on Learning Representations.
Jeffery Kline. Properties of the d-dimensional earth mover’s problem. Discrete Applied Mathematics, 265: 128-141, 2019.
Leonid V Kantorovich. On the translocation of masses. Dokl. Akad. Nauk SSSR, 37:227-229, 1942.
Minimize the d-dimensional EMD using gradient descent.
Discrete Multi-Marginal Optimal Transport (d-MMOT): Let \(a_1, \ldots, a_d\in\mathbb{R}^n_{+}\) be discrete probability distributions. Here, the d-MMOT is the LP,
The dual linear program of the d-MMOT problem is:
where the indices in the constraints include all \(i_j\in[n]\), :math: jin[d]. Denote by \(\phi(a_1,\ldots,a_d)\), the optimal objective value of the LP in d-MMOT problem. Let \(z^*\) be an optimal solution to the dual program. Then,
Using these dual variables naturally provided by the algorithm in ot.lp.dmmot_monge_1dgrid_loss, gradient steps move each input distribution to minimize their d-mmot distance.
A (nx.ndarray, shape (dim, n_hists)) – The input ndarray containing distributions of n bins in d dimensions.
niters (int, optional (default=100)) – The maximum number of iterations for the optimization algorithm.
lr_init (float, optional (default=1e-5)) – The initial learning rate (step size) for the optimization algorithm.
lr_decay (float, optional (default=0.995)) – The learning rate decay rate in each iteration.
print_rate (int, optional (default=100)) – The rate at which to print the objective value and gradient norm during the optimization algorithm.
verbose (bool, optional) – If True, print debugging information during execution. Default=False.
log (bool, optional) – If True, record log. Default is False.
a (list of ndarrays, each of shape (n,)) – The optimal solution as a list of n approximate barycenters, each of length vecsize.
log (dict) – log dictionary return only if log==True in parameters
References
Ronak Mehta, Jeffery Kline, Vishnu Suresh Lokhande, Glenn Fung, & Vikas Singh (2023). Efficient Discrete Multi Marginal Optimal Transport Regularization. In The Eleventh International Conference on Learning Representations.
Olvi L Mangasarian and RR Meyer. Nonlinear perturbation of linear programs. SIAM Journal on Control and Optimization, 17(6):745-752, 1979
Michael C Ferris and Olvi L Mangasarian. Finite perturbation of convex programs. Applied Mathematics and Optimization, 23(1):263-273, 1991.
See also
ot.lp.dmmot_monge_1dgrid_lossd-Dimensional Earth Mover’s Solver
Solves the Earth Movers distance problem and returns the OT matrix
where :
\(\mathbf{M}\) is the metric cost matrix
\(\mathbf{a}\) and \(\mathbf{b}\) are the sample weights
Warning
Note that the \(\mathbf{M}\) matrix in numpy needs to be a C-order numpy.array in float64 format. It will be converted if not in this format
Note
This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.
Note
This function will cast the computed transport plan to the data type of the provided input with the following priority: \(\mathbf{a}\), then \(\mathbf{b}\), then \(\mathbf{M}\) if marginals are not provided. Casting to an integer tensor might result in a loss of precision. If this behaviour is unwanted, please make sure to provide a floating point input.
Note
An error will be raised if the vectors \(\mathbf{a}\) and \(\mathbf{b}\) do not sum to the same value.
Uses the algorithm proposed in [1].
a ((ns,) array-like, float) – Source histogram (uniform weight if empty list)
b ((nt,) array-like, float) – Target histogram (uniform weight if empty list)
M ((ns,nt) array-like, float) – Loss matrix (c-order array in numpy with type float64)
numItermax (int, optional (default=100000)) – The maximum number of iterations before stopping the optimization algorithm if it has not converged.
log (bool, optional (default=False)) – If True, returns a dictionary containing the cost and dual variables. Otherwise returns only the optimal transportation matrix.
center_dual (boolean, optional (default=True)) – If True, centers the dual potential using function
ot.lp.center_ot_dual().
numThreads (int or "max", optional (default=1, i.e. OpenMP is not used)) – If compiled with OpenMP, chooses the number of threads to parallelize. “max” selects the highest number possible.
check_marginals (bool, optional (default=True)) – If True, checks that the marginals mass are equal. If False, skips the check.
gamma (array-like, shape (ns, nt)) – Optimal transportation matrix for the given parameters
log (dict, optional) – If input log is true, a dictionary containing the cost and dual variables and exit status
Examples
Simple example with obvious solution. The function emd accepts lists and perform automatic conversion to numpy arrays
>>> import ot
>>> a=[.5,.5]
>>> b=[.5,.5]
>>> M=[[0.,1.],[1.,0.]]
>>> ot.emd(a, b, M)
array([[0.5, 0. ],
[0. , 0.5]])
References
Bonneel, N., Van De Panne, M., Paris, S., & Heidrich, W. (2011, December). Displacement interpolation using Lagrangian mass transport. In ACM Transactions on Graphics (TOG) (Vol. 30, No. 6, p. 158). ACM.
See also
ot.bregman.sinkhornEntropic regularized OT
ot.optim.cgGeneral regularized OT
Solves the Earth Movers distance problem and returns the loss
where :
\(\mathbf{M}\) is the metric cost matrix
\(\mathbf{a}\) and \(\mathbf{b}\) are the sample weights
Note
This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.
Note
This function will cast the computed transport plan and transportation loss to the data type of the provided input with the following priority: \(\mathbf{a}\), then \(\mathbf{b}\), then \(\mathbf{M}\) if marginals are not provided. Casting to an integer tensor might result in a loss of precision. If this behaviour is unwanted, please make sure to provide a floating point input.
Note
An error will be raised if the vectors \(\mathbf{a}\) and \(\mathbf{b}\) do not sum to the same value.
Uses the algorithm proposed in [1].
a ((ns,) array-like, float64) – Source histogram (uniform weight if empty list)
b ((nt,) array-like, float64) – Target histogram (uniform weight if empty list)
M ((ns,nt) array-like, float64) – Loss matrix (for numpy c-order array with type float64)
processes (int, optional (default=1)) – Nb of processes used for multiple emd computation (deprecated)
numItermax (int, optional (default=100000)) – The maximum number of iterations before stopping the optimization algorithm if it has not converged.
log (boolean, optional (default=False)) – If True, returns a dictionary containing dual variables. Otherwise returns only the optimal transportation cost.
return_matrix (boolean, optional (default=False)) – If True, returns the optimal transportation matrix in the log.
center_dual (boolean, optional (default=True)) – If True, centers the dual potential using function
ot.lp.center_ot_dual().
numThreads (int or "max", optional (default=1, i.e. OpenMP is not used)) – If compiled with OpenMP, chooses the number of threads to parallelize. “max” selects the highest number possible.
check_marginals (bool, optional (default=True)) – If True, checks that the marginals mass are equal. If False, skips the check.
W (float, array-like) – Optimal transportation loss for the given parameters
log (dict) – If input log is true, a dictionary containing dual variables and exit status
Examples
Simple example with obvious solution. The function emd accepts lists and perform automatic conversion to numpy arrays
>>> import ot
>>> a=[.5,.5]
>>> b=[.5,.5]
>>> M=[[0.,1.],[1.,0.]]
>>> ot.emd2(a,b,M)
0.0
References
Bonneel, N., Van De Panne, M., Paris, S., & Heidrich, W. (2011, December). Displacement interpolation using Lagrangian mass transport. In ACM Transactions on Graphics (TOG) (Vol. 30, No. 6, p. 158). ACM.
See also
ot.bregman.sinkhornEntropic regularized OT
ot.optim.cgGeneral regularized OT
Solves the Earth Movers distance problem between 1d measures and returns the loss
where :
d is the metric
\(x_a\) and \(x_b\) are the samples
a and b are the sample weights
This implementation only supports metrics of the form \(d(x, y) = |x - y|^p\).
Uses the algorithm detailed in [1]_
x_a (ndarray of float64, shape (ns,) or (ns, 1)) – Source dirac locations (on the real line)
x_b (ndarray of float64, shape (nt,) or (ns, 1)) – Target dirac locations (on the real line)
a (ndarray of float64, shape (ns,), optional) – Source histogram (default is uniform weight)
b (ndarray of float64, shape (nt,), optional) – Target histogram (default is uniform weight)
metric (str, optional (default='sqeuclidean')) – Metric to be used. Only works with either of the strings ‘sqeuclidean’, ‘minkowski’, ‘cityblock’, or ‘euclidean’.
p (float, optional (default=1.0)) – The p-norm to apply for if metric=’minkowski’
dense (boolean, optional (default=True)) – If True, returns \(\gamma\) as a dense ndarray of shape (ns, nt). Otherwise returns a sparse representation using scipy’s coo_matrix format. Only used if log is set to True. Due to implementation details, this function runs faster when dense is set to False.
log (boolean, optional (default=False)) – If True, returns a dictionary containing the transportation matrix. Otherwise returns only the loss.
loss (float) – Cost associated to the optimal transportation
log (dict) – If input log is True, a dictionary containing the Optimal transportation matrix for the given parameters
Examples
Simple example with obvious solution. The function emd2_1d accepts lists and performs automatic conversion to numpy arrays
>>> import ot
>>> a=[.5, .5]
>>> b=[.5, .5]
>>> x_a = [2., 0.]
>>> x_b = [0., 3.]
>>> ot.emd2_1d(x_a, x_b, a, b)
0.5
>>> ot.emd2_1d(x_a, x_b)
0.5
References
Peyré, G., & Cuturi, M. (2017). “Computational Optimal Transport”, 2018.
See also
ot.lp.emd2EMD for multidimensional distributions
ot.lp.emd_1dEMD for 1d distributions (returns the transportation matrix instead of the cost)
Solves the Earth Movers distance problem between 1d measures and returns the OT matrix
where :
d is the metric
\(x_a\) and \(x_b\) are the samples
a and b are the sample weights
This implementation only supports metrics of the form \(d(x, y) = |x - y|^p\).
Uses the algorithm detailed in [1]_
x_a (ndarray of float64, shape (ns,) or (ns, 1)) – Source dirac locations (on the real line)
x_b (ndarray of float64, shape (nt,) or (ns, 1)) – Target dirac locations (on the real line)
a (ndarray of float64, shape (ns,), optional) – Source histogram (default is uniform weight)
b (ndarray of float64, shape (nt,), optional) – Target histogram (default is uniform weight)
metric (str, optional (default='sqeuclidean')) – Metric to be used. Only works with either of the strings ‘sqeuclidean’, ‘minkowski’, ‘cityblock’, or ‘euclidean’.
p (float, optional (default=1.0)) – The p-norm to apply for if metric=’minkowski’
dense (boolean, optional (default=True)) – If True, returns \(\gamma\) as a dense ndarray of shape (ns, nt). Otherwise returns a sparse representation using scipy’s coo_matrix format. Due to implementation details, this function runs faster when ‘sqeuclidean’, ‘minkowski’, ‘cityblock’, or ‘euclidean’ metrics are used.
log (boolean, optional (default=False)) – If True, returns a dictionary containing the cost. Otherwise returns only the optimal transportation matrix.
check_marginals (bool, optional (default=True)) – If True, checks that the marginals mass are equal. If False, skips the check.
gamma (ndarray, shape (ns, nt)) – Optimal transportation matrix for the given parameters
log (dict) – If input log is True, a dictionary containing the cost
Examples
Simple example with obvious solution. The function emd_1d accepts lists and performs automatic conversion to numpy arrays
>>> import ot
>>> a=[.5, .5]
>>> b=[.5, .5]
>>> x_a = [2., 0.]
>>> x_b = [0., 3.]
>>> ot.emd_1d(x_a, x_b, a, b)
array([[0. , 0.5],
[0.5, 0. ]])
>>> ot.emd_1d(x_a, x_b)
array([[0. , 0.5],
[0.5, 0. ]])
References
Peyré, G., & Cuturi, M. (2017). “Computational Optimal Transport”, 2018.
See also
ot.lp.emdEMD for multidimensional distributions
ot.lp.emd2_1dEMD for 1d distributions (returns cost instead of the transportation matrix)
Solves the Earth Movers distance problem between sorted 1d measures and returns the OT matrix and the associated cost
u_weights ((ns,) ndarray, float64) – Source histogram
v_weights ((nt,) ndarray, float64) – Target histogram
u ((ns,) ndarray, float64) – Source dirac locations (on the real line)
v ((nt,) ndarray, float64) – Target dirac locations (on the real line)
metric (str, optional (default='sqeuclidean')) – Metric to be used. Only works with either of the strings ‘sqeuclidean’, ‘minkowski’, ‘cityblock’, or ‘euclidean’.
p (float, optional (default=1.0)) – The p-norm to apply for if metric=’minkowski’
gamma ((n, ) ndarray, float64) – Values in the Optimal transportation matrix
indices ((n, 2) ndarray, int64) – Indices of the values stored in gamma for the Optimal transportation matrix
cost – cost associated to the optimal transportation
Solves the free support (locations of the barycenters are optimized, not the weights) Wasserstein barycenter problem (i.e. the weighted Frechet mean for the 2-Wasserstein distance), formally:
where :
\(w \in \mathbb{(0, 1)}^{N}\)’s are the barycenter weights and sum to one
measure_weights denotes the \(\mathbf{a}_i \in \mathbb{R}^{k_i}\): empirical measures weights (on simplex)
measures_locations denotes the \(\mathbf{X}_i \in \mathbb{R}^{k_i, d}\): empirical measures atoms locations
\(\mathbf{b} \in \mathbb{R}^{k}\) is the desired weights vector of the barycenter
This problem is considered in [20] (Algorithm 2). There are two differences with the following codes:
we do not optimize over the weights
we do not do line search for the locations updates, we use i.e. \(\theta = 1\) in [20] (Algorithm 2). This can be seen as a discrete implementation of the fixed-point algorithm of [43] proposed in the continuous setting.
measures_locations (list of N (k_i,d) array-like) – The discrete support of a measure supported on \(k_i\) locations of a d-dimensional space (\(k_i\) can be different for each element of the list)
measures_weights (list of N (k_i,) array-like) – Numpy arrays where each numpy array has \(k_i\) non-negatives values summing to one representing the weights of each discrete input measure
X_init ((k,d) array-like) – Initialization of the support locations (on k atoms) of the barycenter
b ((k,) array-like) – Initialization of the weights of the barycenter (non-negatives, sum to 1)
weights ((N,) array-like) – Initialization of the coefficients of the barycenter (non-negatives, sum to 1)
numItermax (int, optional) – Max number of iterations
stopThr (float, optional) – Stop threshold on error (>0)
verbose (bool, optional) – Print information along iterations
log (bool, optional) – record log if True
numThreads (int or "max", optional (default=1, i.e. OpenMP is not used)) – If compiled with OpenMP, chooses the number of threads to parallelize. “max” selects the highest number possible.
X ((k,d) array-like) – Support locations (on k atoms) of the barycenter
.. _references-free-support-barycenter
References
Cuturi, Marco, and Arnaud Doucet. “Fast computation of Wasserstein barycenters.” International Conference on Machine Learning. 2014.
Álvarez-Esteban, Pedro C., et al. “A fixed-point approach to barycenters in Wasserstein space.” Journal of Mathematical Analysis and Applications 441.2 (2016): 744-762.
Solves the OT barycenter problem [77] for generic costs using the fixed point algorithm, iterating the ground barycenter function B on transport plans between the current barycenter and the measures.
The problem finds an optimal barycenter support X of given size (n, d) (enforced by the initialisation), minimising a sum of pairwise transport costs for the costs \(c_k\):
where:
\(X\) (n, d) is the barycenter support,
\(a\) (n) is the (fixed) barycenter weights,
\(Y_k\) (m_k, d_k) is the k-th measure support (measure_locations[k]),
\(b_k\) (m_k) is the k-th measure weights (measure_weights[k]),
\(c_k: \mathbb{R}^{n\times d}\times\mathbb{R}^{m_k\times d_k}\rightarrow \mathbb{R}_+^{n\times m_k}\) is the k-th cost function (which computes the pairwise cost matrix)
\(\mathcal{T}_{c_k}(X, a, Y_k, b)\) is the OT cost between the barycenter measure and the k-th measure with respect to the cost \(c_k\):
in other words, \(\mathcal{T}_{c_k}(X, a, Y_k, b_k)\) is ot.emd2(a, b_k, c_k(X, Y_k)).
The function \(B:\mathbb{R}^{n\times d_1}\times \cdots\times\mathbb{R}^{n\times d_K} \longrightarrow \mathbb{R}^{n\times d}\) is an input to the solver. B computes solutions of the following minimisation problem given \((Y_1, \cdots, Y_K) \in \mathbb{R}^{n\times d_1}\times\cdots\times\mathbb{R}^{n\times d_K}\) (broadcasted along n):
The input function B takes a list of K arrays of shape (n, d_k) and returns an array of shape (n, d). For certain costs, \(B\) can be computed explicitly, or through a numerical solver.
This function implements two algorithms:
Algorithm 2 from [77] when method=true_fixed_point is used, which may increase the support size of the barycenter at each iteration, with a maximum final size of \(N_0 + T\sum_k n_k - TK\) for T iterations and an initial support size of \(N_0\). The computation of the iterates is done using the North West Corner multi-marginal gluing method. This method has convergence guarantees [77].
Algorithm 3 from [77] when method=L2_barycentric_proj is used, which is a heuristic simplification which fixes the weights and support size of the barycenter by performing barycentric projections of the pair-wise OT matrices. This method is substantially faster than the first one, but does not have convergence guarantees. (Default)
The implemented methods ([77] Algorithms 2 and 3), generalises [20] and [43] to general costs and includes convergence guarantees, including for discrete measures.
measure_locations (list of array-like) – List of K arrays of measure positions, each of shape (m_k, d_k).
measure_weights (list of array-like) – List of K arrays of measure weights, each of shape (m_k).
X_init (array-like) – Array of shape (n, d) representing initial barycenter points.
cost_list (list of callable or callable) – List of K cost functions \(c_k: \mathbb{R}^{n\times d}\times\mathbb{R}^{m_k\times d_k} \rightarrow \mathbb{R}_+^{n\times m_k}\). If cost_list is a single callable, the same cost is used K times.
ground_bary (callable or None, optional) – Function List(array(n, d_k)) -> array(n, d) accepting a list of K arrays of shape \((n\times d_K)\), computing the ground barycenters (broadcasted over n). If not provided, done with Adam on PyTorch (requires PyTorch backend), inefficiently using the cost functions in cost_list.
a (array-like, optional) – Array of shape (n,) representing weights of the barycenter measure.Defaults to uniform.
numItermax (int, optional) – Maximum number of iterations (default is 100).
method (str, optional) – Barycentre method: ‘L2_barycentric_proj’ (default) for Euclidean barycentric projection, or ‘true_fixed_point’ for iterates using the North West Corner multi-marginal gluing method.
stopThr (float, optional) – If \(W_2^2(a_t, X_t, a_{t+1}, X_{t+1}) < \mathrm{stopThr} \times \frac{1}{n}\|X_t\|_2^2\), terminate (default is 1e-5).
log (bool, optional) – Whether to return the log dictionary (default is False).
ground_bary_lr (float, optional) – Learning rate for the ground barycenter solver (if auto is used).
ground_bary_numItermax (int, optional) – Maximum number of iterations for the ground barycenter solver (if auto is used).
ground_bary_stopThr (float, optional) – Stop threshold for the ground barycenter solver (if auto is used): stop if the energy decreases less than this value.
ground_bary_solver (str, optional) – Solver for auto ground bary solver (torch SGD or Adam). Default is “SGD”.
clean_measure (bool, optional) – For method==’true_fixed_point’, whether to clean the discrete measure (X, a) at each iteration to remove duplicate points and sum their weights (default is False).
X (array-like) – Array of shape (n, d) representing barycenter points.
log_dict (list of array-like, optional) – log containing the exit status, list of iterations and list of displacements if log is True.
References
Tanguy, Eloi and Delon, Julie and Gozlan, Nathaël (2024). Computing barycenters of Measures for Generic Transport Costs. arXiv preprint 2501.04016 (2024)
Cuturi, Marco, and Arnaud Doucet. “Fast computation of Wasserstein barycenters.” International Conference on Machine Learning. 2014.
Álvarez-Esteban, Pedro C., et al. “A fixed-point approach to barycenters in Wasserstein space.” Journal of Mathematical Analysis and Applications 441.2 (2016): 744-762.
Note
For the case of the L2 cost \(c_k(x, y) = \|x-y\|_2^2\), the ground barycenter is simply the Euclidean barycenter, i.e. \(B(y_1, \cdots, y_K) = \sum_k w_k y_k\). In this case, we recover the free-support algorithm from [20].
See also
ot.lp.free_support_barycenter : Free support solver for the case where \(c_k(x,y) = \lambda_k\|x-y\|_2^2\).
ot.lp.generalized_free_support_barycenter : Free support solver for the case where \(c_k(x,y) = \|P_kx-y\|_2^2\) with \(P_k\) linear.
ot.lp.NorthWestMMGluing : gluing method used in the true_fixed_point method.
Solves the free support generalized Wasserstein barycenter problem: finding a barycenter (a discrete measure with a fixed amount of points of uniform weights) whose respective projections fit the input measures. More formally:
where :
\(\gamma = \sum_{l=1}^n b_l\delta_{y_l}\) is the desired barycenter with each \(y_l \in \mathbb{R}^d\)
\(\mathbf{b} \in \mathbb{R}^{n}\) is the desired weights vector of the barycenter
The input measures are \(\nu_i = \sum_{j=1}^{k_i}a_{i,j}\delta_{x_{i,j}}\)
The \(\mathbf{a}_i \in \mathbb{R}^{k_i}\) are the respective empirical measures weights (on the simplex)
The \(\mathbf{X}_i \in \mathbb{R}^{k_i, d_i}\) are the respective empirical measures atoms locations
\(w = (w_1, \cdots w_p)\) are the barycenter coefficients (on the simplex)
Each \(\mathbf{P}_i \in \mathbb{R}^{d, d_i}\), and \(P_i\#\nu_i = \sum_{j=1}^{k_i}a_{i,j}\delta_{P_ix_{i,j}}\)
As show by [42], this problem can be re-written as a Wasserstein Barycenter problem, which we solve using the free support method [20] (Algorithm 2).
X_list (list of p (k_i,d_i) array-like) – Discrete supports of the input measures: each consists of \(k_i\) locations of a d_i-dimensional space (\(k_i\) can be different for each element of the list)
a_list (list of p (k_i,) array-like) – Measure weights: each element is a vector (k_i) on the simplex
P_list (list of p (d_i,d) array-like) – Each \(P_i\) is a linear map \(\mathbb{R}^{d} \rightarrow \mathbb{R}^{d_i}\)
n_samples_bary (int) – Number of barycenter points
Y_init ((n_samples_bary,d) array-like) – Initialization of the support locations (on k atoms) of the barycenter
b ((n_samples_bary,) array-like) – Initialization of the weights of the barycenter measure (on the simplex)
weights ((p,) array-like) – Initialization of the coefficients of the barycenter (on the simplex)
numItermax (int, optional) – Max number of iterations
stopThr (float, optional) – Stop threshold on error (>0)
verbose (bool, optional) – Print information along iterations
log (bool, optional) – record log if True
numThreads (int or "max", optional (default=1, i.e. OpenMP is not used)) – If compiled with OpenMP, chooses the number of threads to parallelize. “max” selects the highest number possible.
eps (Stability coefficient for the change of variable matrix inversion) – If the \(\mathbf{P}_i^T\) matrices don’t span \(\mathbb{R}^d\), the problem is ill-defined and a matrix inversion will fail. In this case one may set eps=1e-8 and get a solution anyway (which may make little sense)
Y – Support locations (on n_samples_bary atoms) of the barycenter
(n_samples_bary,d) array-like
References
Cuturi, M. and Doucet, A.. “Fast computation of Wasserstein barycenters.” International Conference on Machine Learning. 2014.
Delon, J., Gozlan, N., and Saint-Dizier, A.. Generalized Wasserstein barycenters between probability measures living on different subspaces. arXiv preprint arXiv:2105.09755, 2021.
Computes the Linear Circular Optimal Transport distance from [78] using \(\eta=\mathrm{Unif}(S^1)\) as reference measure. Samples need to be in \(S^1\cong [0,1[\). If they are on \(\mathbb{R}\), takes the value modulo 1. If the values are on \(S^1\subset\mathbb{R}^2\), it is required to first find the coordinates using e.g. the atan2 function.
General loss returned:
where \(\hat{\mu}(x)=F_{\mu}^{-1}(x-\int z\mathrm{d}\mu(z)+\frac12) - x\) for all \(x\in [0,1[\), and \(d_{S^1}(x,y)=\min(|x-y|, 1-|x-y|)\) for \(x,y\in [0,1[\).
u_values (ndarray, shape (n, ...)) – samples in the source domain (coordinates on [0,1[)
v_values (ndarray, shape (n, ...), optional) – samples in the target domain (coordinates on [0,1[), if None, compute distance against uniform distribution
u_weights (ndarray, shape (n, ...), optional) – samples weights in the source domain
v_weights (ndarray, shape (n, ...), optional) – samples weights in the target domain
loss – Batched cost associated to the linear optimal transportation
float/array-like, shape (…)
Examples
>>> u = np.array([[0.2,0.5,0.8]])%1
>>> v = np.array([[0.4,0.5,0.7]])%1
>>> linear_circular_ot(u.T, v.T)
array([0.0127])
References
Martin, R. D., Medri, I., Bai, Y., Liu, X., Yan, K., Rohde, G. K., & Kolouri, S. (2024). LCOT: Linear Circular Optimal Transport. International Conference on Learning Representations.
Computes the energy of the OT barycenter functional for a given barycenter support X and weights a: .. math:
V(X, a) = \sum_{k=1}^K w_k \mathcal{T}_{c_k}(X, a, Y_k, b_k),
where: - \(X\) (n, d) is the barycenter support, - \(a\) (n) is the barycenter weights, - \(Y_k\) (m_k, d_k) is the k-th measure support
(measure_locations[k]),
\(b_k\) (m_k) is the k-th measure weights (measure_weights[k]),
rightarrow mathbb{R}_+^{ntimes m_k}` is the k-th cost function (which computes the pairwise cost matrix)
\(\mathcal{T}_{c_k}(X, a, Y_k, b)\) is the OT cost between the barycenter measure and the k-th measure with respect to the cost \(c_k\).
The function computes \(V(X, a)\) as defined above.
measure_locations (list of array-like) – List of K arrays of measure positions, each of shape (m_k, d_k).
measure_weights (list of array-like) – List of K arrays of measure weights, each of shape (m_k).
X (array-like) – Array of shape (n, d) representing barycenter points.
a (array-like) – Array of shape (n,) representing barycenter weights.
cost_list (list of callable or callable) – List of K cost functions \(c_k: \mathbb{R}^{n\times d}\times\mathbb{R}^{m_k\times d_k} \rightarrow \mathbb{R}_+^{n\times m_k}\). If cost_list is a single callable, the same cost is used K times.
nx (backend, optional) – The backend to use.
V – The value of the OT barycenter functional \(V(X, a)\).
References
Tanguy, Eloi and Delon, Julie and Gozlan, Nathaël (2024). Computing barycenters of Measures for Generic Transport Costs. arXiv preprint 2501.04016 (2024)
Computes the closed-form for the 2-Wasserstein distance between samples and a uniform distribution on \(S^1\) Samples need to be in \(S^1\cong [0,1[\). If they are on \(\mathbb{R}\), takes the value modulo 1. If the values are on \(S^1\subset\mathbb{R}^2\), it is required to first find the coordinates using e.g. the atan2 function.
where:
\(\nu=\mathrm{Unif}(S^1)\) and \(\mu_n = \sum_{i=1}^n \alpha_i \delta_{x_i}\)
For values \(x=(x_1,x_2)\in S^1\), it is required to first get their coordinates with
using e.g. ot.utils.get_coordinate_circle(x).
u_values (ndarray, shape (n, ...)) – Samples
u_weights (ndarray, shape (n, ...), optional) – samples weights in the source domain
loss – Batched cost associated to the optimal transportation
float/array-like, shape (…)
Examples
>>> x0 = np.array([[0], [0.2], [0.4]])
>>> semidiscrete_wasserstein2_unif_circle(x0)
array([0.02111111])
References
Bonet, C., Berg, P., Courty, N., Septier, F., Drumetz, L., & Pham, M. T. (2023). Spherical sliced-wasserstein. International Conference on Learning Representations.
Computes the 1 dimensional OT loss [15] between two (batched) empirical distributions
It is formally the p-Wasserstein distance raised to the power p. We do so in a vectorized way by first building the individual quantile functions then integrating them.
This function should be preferred to emd_1d whenever the backend is different to numpy, and when gradients over either sample positions or weights are required.
u_values (array-like, shape (n, ...)) – locations of the first empirical distribution
v_values (array-like, shape (m, ...)) – locations of the second empirical distribution
u_weights (array-like, shape (n, ...), optional) – weights of the first empirical distribution, if None then uniform weights are used
v_weights (array-like, shape (m, ...), optional) – weights of the second empirical distribution, if None then uniform weights are used
p (int, optional) – order of the ground metric used, should be at least 1 (see [2, Chap. 2], default is 1
require_sort (bool, optional) – sort the distributions atoms locations, if False we will consider they have been sorted prior to being passed to the function, default is True
cost – the batched EMD
float/array-like, shape (…)
References
Peyré, G., & Cuturi, M. (2018). Computational Optimal Transport.
Computes the Wasserstein distance on the circle using either [45] for p=1 or the binary search algorithm proposed in [44] otherwise. Samples need to be in \(S^1\cong [0,1[\). If they are on \(\mathbb{R}\), takes the value modulo 1. If the values are on \(S^1\subset\mathbb{R}^2\), it requires to first find the coordinates using e.g. the atan2 function.
General loss returned:
For p=1, [45]
For values \(x=(x_1,x_2)\in S^1\), it is required to first get their coordinates with
using e.g. ot.utils.get_coordinate_circle(x)
The function runs on backend but tensorflow and jax are not supported.
u_values (ndarray, shape (n, ...)) – samples in the source domain (coordinates on [0,1[)
v_values (ndarray, shape (n, ...)) – samples in the target domain (coordinates on [0,1[)
u_weights (ndarray, shape (n, ...), optional) – samples weights in the source domain
v_weights (ndarray, shape (n, ...), optional) – samples weights in the target domain
p (float, optional (default=1)) – Power p used for computing the Wasserstein distance
Lm (int, optional) – Lower bound dC. For p>1.
Lp (int, optional) – Upper bound dC. For p>1.
tm (float, optional) – Lower bound theta. For p>1.
tp (float, optional) – Upper bound theta. For p>1.
eps (float, optional) – Stopping condition. For p>1.
require_sort (bool, optional) – If True, sort the values.
loss – Batched cost associated to the optimal transportation
float/array-like, shape (…)
Examples
>>> u = np.array([[0.2,0.5,0.8]])%1
>>> v = np.array([[0.4,0.5,0.7]])%1
>>> wasserstein_circle(u.T, v.T)
array([0.1])
References
Hundrieser, Shayan, Marcel Klatt, and Axel Munk. “The statistics of circular optimal transport.” Directional Statistics for Innovative Applications: A Bicentennial Tribute to Florence Nightingale. Singapore: Springer Nature Singapore, 2022. 57-82.
Delon, Julie, Julien Salomon, and Andrei Sobolevski. “Fast transport optimization for Monge costs on the circle.” SIAM Journal on Applied Mathematics 70.7 (2010): 2239-2258.