Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
Discussion options

Hey, I've found your package and I think I should be able to apply it to my problem. However, I'm not sure how.

I want to optimize a function where the input is a set of samples or rather the indices with which I can get them by indexing into a larger dataset.

Example:

using LinearAlgebra

datapoints_feature1 = randn(123, 10)
datapoints_feature2 = randn(213, 10)

selected_samples = [1; 5; 7] # this is the input to the function that should be optimized

function to_optimize(selected_samples)
    m1 = datapoints_feature1[:, selected_samples]'datapoints_feature1[:, selected_samples]
    m2 = datapoints_feature2[:, selected_samples]'datapoints_feature2[:, selected_samples]
    
    return dot(m1, m2)
end

Currently, I'm using a Genetic Algorithm from Metaheuristics.jl. For this, every datapoint has a score with which the datapoints are sorted. Then, the top k samples are selected.

For the above example:

scores = randn(10)
select_samples(scores; k=3) = sortperm(scores)[1:k]

I've checked InferOpt.jl and found soft_rank and soft_sort. Checking the code, it seems that I should be able to get something like soft_perm from them (by not applying the indexing with invperm). However, this will be (as the name says) a soft_perm which can not directly be used for indexing. Maybe something like a straight-through optimizer should work there (discrete solution in forward-pass, reverse-pass uses soft version).

Do you have any recommendation how to implement the optimizer for my problem? I want to try a gradient-based optimization because most of the computation of the optimized function is differentiable and I think using this information might help.

Thanks and best regards,
Heiner

You must be logged in to vote

Replies: 3 comments · 2 replies

Comment options

Hello!
Sorry for the delay, I'm currently on holidays, I should be able to look at your question tomorrow or on monday.

Best,
Léo

You must be logged in to vote
0 replies
Comment options

Hey,
No worries and thanks a lot. I really appreciate your help.

Best,
Heiner

You must be logged in to vote
0 replies
Comment options

Hello,

I looked a bit into your question this afternoon.

soft_sort and soft_rank are specific implementations for sorting and ranking, I don't know exactly if they can be adapted to build a soft_perm, and as you said, you need integer indices to compute your cost function.

A straightforward way to tackle your problem with InferOpt with no additional assumptions, is to directly differentiate through the black-box cost function using a Perturbed maximizer. This may not be the best method, but can be applied to any problem. Here is how you can implement it:

using InferOpt
using LinearAlgebra
using Optimisers
using Plots
using Zygote

M = 123
K = 10
datapoints_feature1 = randn(M, K)
datapoints_feature2 = randn(M, K)

select_samples(scores; k=3) = sortperm(scores)[1:k]
function top_k(scores; k=3)
    res = falses(length(scores))
    res[select_samples(scores; k=k)] .= 1
    return res
end

function cost(y)
    m1 = datapoints_feature1[:, y]'datapoints_feature1[:, y]
    m2 = datapoints_feature2[:, y]'datapoints_feature2[:, y]
    return dot(m1, m2)
end

perturbed = PerturbedAdditive(top_k; ε=0.5, nb_samples=1000)
loss = Pushforward(perturbed, cost)
pipeline = cost  top_k

# Training loop
θ_start = randn(K)
perturbed(θ_start)
θ = copy(θ_start)
optimizer = Optimisers.setup(Descent(1e-3), θ)
N = 10
cost_history = Float64[pipeline(θ_start)]
loss_history = Float64[]
for i in 1:N
    grad = gradient(loss, θ)[1]
    Optimisers.update!(optimizer, θ, grad)
    @info "$i: $(loss(θ)) $(norm(grad))"
    push!(cost_history, pipeline(θ))
    push!(loss_history, loss(θ))
end

plot(loss_history)
plot(cost_history)

pipeline(θ_start)
pipeline(θ)

I have some questions about your setting:

  • Do you only have only one datapoints_feature1 and datapoints_feature2, or do you have several of them and you want to learn an algorithm that solves the problem for any datapoints_feature input? If not, using Bayesian optimization might work better than InferOpt since the scores dimension is small (=10)
  • If the answer to the previous question is yes, do you have an offline way to compute good solutions for some datapoints_feature examples? If yes, there is a better method that can be used and that probably will work better.
You must be logged in to vote
2 replies
@h-spiess
Comment options

Thanks a lot for the detailed reply and the code. I will try it next week.

I have only the two datapoints_feature1 and datapoints_feature2. These are like datasets. For these I want to find min/max. subsets. In practice however, the datasets are way larger. E.g. I'm sometimes looking for 1000 out of 10000. Bayesian Opt. probably does not scale to that.

What do you mean by an offline way? Is this just related to the multiple datasets setting?

Best,
Heiner

@BatyLeo
Comment options

Ok, since you have only have two datapoints_feature1 and datapoints_feature2, my questions do not really apply and the "offline way" I was referring to does not make sense in your setting.

The simplest method is then trying the PerturbedAddtive wrapper to make your function differentiable.
Maybe there is a way to build a specialized soft_perm that is more efficient, you would need to look at the maths from soft_rank and soft_sort to see if differentiation rules could be extended to your case (see the original paper https://arxiv.org/abs/2002.08871)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.