-
Notifications
You must be signed in to change notification settings - Fork 263
Deserved attr is not correctly calculated in proportion plugin #729
Description
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
I have a job not getting scheduled while there is enough resources in the cluster. I looked into the log and find that in allocate action, the queue is marked as overused:
I0410 04:06:16.001089 1 allocate.go:72] Queue <11073333> is overused, ignore it.
I0410 04:06:16.001094 1 allocate.go:72] Queue <11073333> is overused, ignore it
The reason why the queue is overused is that in proportion plugin, the deserved value for the queue is not correctly calculated:
attr.deserved.Add(remaining.Clone().Multi(float64(attr.weight) / float64(totalWeight)))
if !attr.deserved.LessEqual(attr.request) {
attr.deserved = helpers.Min(attr.deserved, attr.request)
meet[attr.queueID] = struct{}{}
}
For example, the attr.deserved is <cpu 523750.00, memory 3076011404288.00, GPU 3750.00>
and attr.request is <cpu 608000.00, memory 5153960755200.00, GPU 0.00>
,
!attr.deserved.LessEqual(attr.request)
return true and the queue is set to meet and is not allocated enough resources.
I think we should use attr.request.LessEqual(attr.deserved)
instead.
Another problem is that the calculation of the total increased deserved:
deserved.Add(attr.deserved.Clone().Sub(oldDeserved))
We assume that the attr.deserved is greater than oldDeserved, which is wrong,for example, oldDeserved
can be <cpu 523750.00, memory 3076011404288.00, GPU 3750.00>
and attr.deserved
can be <cpu 500750.00, memory 5076011404288.00, GPU 0.00>
, memory is increased but gpu and cpu is decreased.
We should deal with both the increased and decreased value.