-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Fix Stable Unique sort not being stable #17189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PaulHigin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR description should include an example of the problem being addressed.
| dataIndex--; | ||
| } | ||
|
|
||
| dataToSort.RemoveAt(dataIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performing a remove operation does not seem like the right fix for performance. @daxian-dbw please take a look.
I have included a new pester test in the PR which correctly fails due to the unique stable sort being broken. The test case below when running the released versions will result in the record "A2" being overwrote with B2. This results in B2 being before record B1 causing B2 to be returned and thus not being stable. $Record1=[PSCustomObject]@{Key="A";Record="A1"}
$Record2=[PSCustomObject]@{Key="A";Record="A2"}
$Record3=[PSCustomObject]@{Key="B";Record="B1"}
$Record4=[PSCustomObject]@{Key="B";Record="B2"}
$Records = @($Record1,$Record2,$Record3,$Record4)
$results = $Records | Sort-Object -Stable -Unique -Property KeyI believe this will result in a performance regression compared to the current code (Stable Unique only) but since the current results may be incorrect I am not sure if that is a fair comparison. |
|
I rewrote the logic to avoid the use of RemoveAt in favor of array manipulation. I expect it to be more favorable when there are a large amount of duplicate elements since elements are only moved when there is not a duplicate. |
test/powershell/Modules/Microsoft.PowerShell.Utility/Sort-Object.Tests.ps1
Outdated
Show resolved
Hide resolved
|
This PR has Quantification details
Why proper sizing of changes matters
Optimal pull request sizes drive a better predictable PR flow as they strike a
What can I do to optimize my changes
How to interpret the change counts in git diff output
Was this comment helpful? 👍 :ok_hand: :thumbsdown: (Email) |
|
@m1k0net It doesn't matter if tests include an example of the problem being solved, that is expected. The PR is not well formed if it does not include a description of the problem, or a link to an issue that describes the problem:
Please include this in the PR description. |
PaulHigin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. But I would like @daxian-dbw to also have a look.
daxian-dbw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
|
||
| // If we're doing a unique sort and the entry is not unique, discard the duplicate entry | ||
| if (Unique && !uniqueSet.Add(dataToSort[dataIndex])) | ||
| if (Unique && !uniqueSet.Add(dataToSort[dataIndex + discardedDuplicates])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we could have discardedDuplicates as absolute value and exclude the additions in loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to keep with the previous code's practice with it calculated inline. I don't know the impacts of one vs the other so I left that part alone.
|
🎉 Handy links: |
This corrects an issue where the movement of objects in the array to be sorted where removing duplicate elements may result in a side effect of objects being out of order for a unique stable sort as seen in issue #16907.
This can be reproduced with the below code
This results in
Where the expected results is.
This problem occurs when a duplicate element in the array is found and then replaced with an element from the end. In the example above causing record B2 to be put in place of A2. After that, record B2 is treated as the first object for the unique selection despite the stable flag being set to retain the order.