Implementation Help: Select set of sample that optimizes some function #123

Jun 13, 2025

h-spiess
Jun 13, 2025

Hey, I've found your package and I think I should be able to apply it to my problem. However, I'm not sure how.

I want to optimize a function where the input is a set of samples or rather the indices with which I can get them by indexing into a larger dataset.

Example:

using LinearAlgebra

datapoints_feature1 = randn(123, 10)
datapoints_feature2 = randn(213, 10)

selected_samples = [1; 5; 7] # this is the input to the function that should be optimized

function to_optimize(selected_samples)
    m1 = datapoints_feature1[:, selected_samples]'datapoints_feature1[:, selected_samples]
    m2 = datapoints_feature2[:, selected_samples]'datapoints_feature2[:, selected_samples]
    
    return dot(m1, m2)
end

Currently, I'm using a Genetic Algorithm from Metaheuristics.jl. For this, every datapoint has a score with which the datapoints are sorted. Then, the top k samples are selected.

For the above example:

scores = randn(10)
select_samples(scores; k=3) = sortperm(scores)[1:k]

I've checked InferOpt.jl and found soft_rank and soft_sort. Checking the code, it seems that I should be able to get something like soft_perm from them (by not applying the indexing with invperm). However, this will be (as the name says) a soft_perm which can not directly be used for indexing. Maybe something like a straight-through optimizer should work there (discrete solution in forward-pass, reverse-pass uses soft version).

Do you have any recommendation how to implement the optimizer for my problem? I want to try a gradient-based optimization because most of the computation of the optimized function is differentiable and I think using this information might help.

Thanks and best regards,
Heiner

Jun 14, 2025

BatyLeo
Jun 14, 2025
Maintainer

Hello!
Sorry for the delay, I'm currently on holidays, I should be able to look at your question tomorrow or on monday.

Best,
Léo

0 replies

Jun 16, 2025

h-spiess
Jun 16, 2025
Author

Hey,
No worries and thanks a lot. I really appreciate your help.

Best,
Heiner

0 replies

h-spiess · Jun 18, 2025

BatyLeo
Jun 18, 2025
Maintainer

Hello,

I looked a bit into your question this afternoon.

soft_sort and soft_rank are specific implementations for sorting and ranking, I don't know exactly if they can be adapted to build a soft_perm, and as you said, you need integer indices to compute your cost function.

A straightforward way to tackle your problem with InferOpt with no additional assumptions, is to directly differentiate through the black-box cost function using a Perturbed maximizer. This may not be the best method, but can be applied to any problem. Here is how you can implement it:

using InferOpt
using LinearAlgebra
using Optimisers
using Plots
using Zygote

M = 123
K = 10
datapoints_feature1 = randn(M, K)
datapoints_feature2 = randn(M, K)

select_samples(scores; k=3) = sortperm(scores)[1:k]
function top_k(scores; k=3)
    res = falses(length(scores))
    res[select_samples(scores; k=k)] .= 1
    return res
end

function cost(y)
    m1 = datapoints_feature1[:, y]'datapoints_feature1[:, y]
    m2 = datapoints_feature2[:, y]'datapoints_feature2[:, y]
    return dot(m1, m2)
end

perturbed = PerturbedAdditive(top_k; ε=0.5, nb_samples=1000)
loss = Pushforward(perturbed, cost)
pipeline = cost ∘ top_k

# Training loop
θ_start = randn(K)
perturbed(θ_start)
θ = copy(θ_start)
optimizer = Optimisers.setup(Descent(1e-3), θ)
N = 10
cost_history = Float64[pipeline(θ_start)]
loss_history = Float64[]
for i in 1:N
    grad = gradient(loss, θ)[1]
    Optimisers.update!(optimizer, θ, grad)
    @info "$i: $(loss(θ)) $(norm(grad))"
    push!(cost_history, pipeline(θ))
    push!(loss_history, loss(θ))
end

plot(loss_history)
plot(cost_history)

pipeline(θ_start)
pipeline(θ)

I have some questions about your setting:

Do you only have only one datapoints_feature1 and datapoints_feature2, or do you have several of them and you want to learn an algorithm that solves the problem for any datapoints_feature input? If not, using Bayesian optimization might work better than InferOpt since the scores dimension is small (=10)
If the answer to the previous question is yes, do you have an offline way to compute good solutions for some datapoints_feature examples? If yes, there is a better method that can be used and that probably will work better.

2 replies

h-spiess Jun 20, 2025
Author

Thanks a lot for the detailed reply and the code. I will try it next week.

I have only the two datapoints_feature1 and datapoints_feature2. These are like datasets. For these I want to find min/max. subsets. In practice however, the datasets are way larger. E.g. I'm sometimes looking for 1000 out of 10000. Bayesian Opt. probably does not scale to that.

What do you mean by an offline way? Is this just related to the multiple datasets setting?

Best,
Heiner

BatyLeo Jun 25, 2025
Maintainer

Ok, since you have only have two datapoints_feature1 and datapoints_feature2, my questions do not really apply and the "offline way" I was referring to does not make sense in your setting.

The simplest method is then trying the PerturbedAddtive wrapper to make your function differentiable.
Maybe there is a way to build a specialized soft_perm that is more efficient, you would need to look at the maths from soft_rank and soft_sort to see if differentiation rules could be extended to your case (see the original paper https://arxiv.org/abs/2002.08871)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implementation Help: Select set of sample that optimizes some function #123

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments · 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Search code, repositories, users, issues, pull requests...

Implementation Help: Select set of sample that optimizes some function #123

Uh oh!

h-spiess Jun 13, 2025

Replies: 3 comments · 2 replies

Uh oh!

BatyLeo Jun 14, 2025 Maintainer

Uh oh!

h-spiess Jun 16, 2025 Author

Uh oh!

BatyLeo Jun 18, 2025 Maintainer

Uh oh!

h-spiess Jun 20, 2025 Author

Uh oh!

BatyLeo Jun 25, 2025 Maintainer

h-spiess
Jun 13, 2025

BatyLeo
Jun 14, 2025
Maintainer

h-spiess
Jun 16, 2025
Author

BatyLeo
Jun 18, 2025
Maintainer

h-spiess Jun 20, 2025
Author

BatyLeo Jun 25, 2025
Maintainer