Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

@m1k0net
Copy link
Contributor

@m1k0net m1k0net commented Apr 25, 2022

This corrects an issue where the movement of objects in the array to be sorted where removing duplicate elements may result in a side effect of objects being out of order for a unique stable sort as seen in issue #16907.

This can be reproduced with the below code

$Record1=[PSCustomObject]@{Key="A";Record="A1"}
$Record2=[PSCustomObject]@{Key="A";Record="A2"}
$Record3=[PSCustomObject]@{Key="B";Record="B1"}
$Record4=[PSCustomObject]@{Key="B";Record="B2"}
$Records = @($Record1,$Record2,$Record3,$Record4)
$results = $Records | Sort-Object -Stable -Unique -Property Key
$results  | Format-Table

This results in

Key Record
--- ------
A   A1
B   B2

Where the expected results is.

Key Record
--- ------
A   A1
B   B1

This problem occurs when a duplicate element in the array is found and then replaced with an element from the end. In the example above causing record B2 to be put in place of A2. After that, record B2 is treated as the first object for the unique selection despite the stable flag being set to retain the order.

@ghost
Copy link

ghost commented Apr 25, 2022

CLA assistant check
All CLA requirements met.

Copy link
Contributor

@PaulHigin PaulHigin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description should include an example of the problem being addressed.

dataIndex--;
}

dataToSort.RemoveAt(dataIndex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performing a remove operation does not seem like the right fix for performance. @daxian-dbw please take a look.

@ghost ghost added the Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept label Apr 25, 2022
@PaulHigin PaulHigin requested a review from daxian-dbw April 25, 2022 17:17
@m1k0net
Copy link
Contributor Author

m1k0net commented Apr 25, 2022

The PR description should include an example of the problem being addressed.

I have included a new pester test in the PR which correctly fails due to the unique stable sort being broken. The test case below when running the released versions will result in the record "A2" being overwrote with B2. This results in B2 being before record B1 causing B2 to be returned and thus not being stable.

			$Record1=[PSCustomObject]@{Key="A";Record="A1"}
			$Record2=[PSCustomObject]@{Key="A";Record="A2"}
			$Record3=[PSCustomObject]@{Key="B";Record="B1"}
			$Record4=[PSCustomObject]@{Key="B";Record="B2"}
			$Records = @($Record1,$Record2,$Record3,$Record4)
			$results = $Records | Sort-Object -Stable -Unique -Property Key

I believe this will result in a performance regression compared to the current code (Stable Unique only) but since the current results may be incorrect I am not sure if that is a fair comparison.

@ghost ghost removed the Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept label Apr 25, 2022
@m1k0net
Copy link
Contributor Author

m1k0net commented Apr 25, 2022

I rewrote the logic to avoid the use of RemoveAt in favor of array manipulation. I expect it to be more favorable when there are a large amount of duplicate elements since elements are only moved when there is not a duplicate.

@pull-request-quantifier-deprecated

This PR has 19 quantified lines of changes. In general, a change size of upto 200 lines is ideal for the best PR experience!


Quantification details

Label      : Extra Small
Size       : +14 -5
Percentile : 7.6%

Total files changed: 2

Change summary by file extension:
.cs : +5 -5
.ps1 : +9 -0

Change counts above are quantified counts, based on the PullRequestQuantifier customizations.

Why proper sizing of changes matters

Optimal pull request sizes drive a better predictable PR flow as they strike a
balance between between PR complexity and PR review overhead. PRs within the
optimal size (typical small, or medium sized PRs) mean:

  • Fast and predictable releases to production:
    • Optimal size changes are more likely to be reviewed faster with fewer
      iterations.
    • Similarity in low PR complexity drives similar review times.
  • Review quality is likely higher as complexity is lower:
    • Bugs are more likely to be detected.
    • Code inconsistencies are more likely to be detetcted.
  • Knowledge sharing is improved within the participants:
    • Small portions can be assimilated better.
  • Better engineering practices are exercised:
    • Solving big problems by dividing them in well contained, smaller problems.
    • Exercising separation of concerns within the code changes.

What can I do to optimize my changes

  • Use the PullRequestQuantifier to quantify your PR accurately
    • Create a context profile for your repo using the context generator
    • Exclude files that are not necessary to be reviewed or do not increase the review complexity. Example: Autogenerated code, docs, project IDE setting files, binaries, etc. Check out the Excluded section from your prquantifier.yaml context profile.
    • Understand your typical change complexity, drive towards the desired complexity by adjusting the label mapping in your prquantifier.yaml context profile.
    • Only use the labels that matter to you, see context specification to customize your prquantifier.yaml context profile.
  • Change your engineering behaviors
    • For PRs that fall outside of the desired spectrum, review the details and check if:
      • Your PR could be split in smaller, self-contained PRs instead
      • Your PR only solves one particular issue. (For example, don't refactor and code new features in the same PR).

How to interpret the change counts in git diff output

  • One line was added: +1 -0
  • One line was deleted: +0 -1
  • One line was modified: +1 -1 (git diff doesn't know about modified, it will
    interpret that line like one addition plus one deletion)
  • Change percentiles: Change characteristics (addition, deletion, modification)
    of this PR in relation to all other PRs within the repository.


Was this comment helpful? 👍  :ok_hand:  :thumbsdown: (Email)
Customize PullRequestQuantifier for this repository.

@PaulHigin
Copy link
Contributor

@m1k0net It doesn't matter if tests include an example of the problem being solved, that is expected. The PR is not well formed if it does not include a description of the problem, or a link to an issue that describes the problem:

  • repro steps
  • results
  • expected results

Please include this in the PR description.

@m1k0net m1k0net changed the title Fix Stable Unique sort not being stable #16907 Fix Stable Unique sort not being stable Apr 26, 2022
Copy link
Contributor

@PaulHigin PaulHigin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. But I would like @daxian-dbw to also have a look.

Copy link
Member

@daxian-dbw daxian-dbw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@daxian-dbw daxian-dbw assigned daxian-dbw and unassigned anmenaga Apr 26, 2022
@daxian-dbw daxian-dbw added the CL-General Indicates that a PR should be marked as a general cmdlet change in the Change Log label Apr 26, 2022
@daxian-dbw daxian-dbw merged commit 0067dec into PowerShell:master Apr 26, 2022

// If we're doing a unique sort and the entry is not unique, discard the duplicate entry
if (Unique && !uniqueSet.Add(dataToSort[dataIndex]))
if (Unique && !uniqueSet.Add(dataToSort[dataIndex + discardedDuplicates]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we could have discardedDuplicates as absolute value and exclude the additions in loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to keep with the previous code's practice with it calculated inline. I don't know the impacts of one vs the other so I left that part alone.

@ghost
Copy link

ghost commented May 23, 2022

🎉v7.3.0-preview.4 has been released which incorporates this pull request.:tada:

Handy links:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CL-General Indicates that a PR should be marked as a general cmdlet change in the Change Log Extra Small

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.