Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Change 16-bit swizzle from vector to C arrays #190

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 27, 2025

Conversation

sterrettm2
Copy link
Contributor

This gives around a 4x speedup for int16_t and uint16_t, and a small speedup for _Float16.

Benchmark                                                                 Time             CPU      Time Old      Time New       CPU Old       CPU New
------------------------------------------------------------------------------------------------------------------------------------------------------
[simdsort/random_10m/ vs. simdsort/random_10m/]uint64_t                -0.0050         -0.0050      76843507      76455793      76842001      76455226
[simdsort/random_10m/ vs. simdsort/random_10m/]int64_t                 -0.0032         -0.0031      77150199      76902515      77140564      76900428
[simdsort/random_10m/ vs. simdsort/random_10m/]uint32_t                -0.0044         -0.0044      31186978      31048516      31183252      31044913
[simdsort/random_10m/ vs. simdsort/random_10m/]int32_t                 -0.0044         -0.0043      31110415      30974272      31107703      30973732
[simdsort/random_10m/ vs. simdsort/random_10m/]uint16_t                -0.7547         -0.7548     113662270      27878061     113660292      27874788
[simdsort/random_10m/ vs. simdsort/random_10m/]int16_t                 -0.7572         -0.7572     114264023      27737899     114252418      27735801
[simdsort/random_10m/ vs. simdsort/random_10m/]float                   -0.0045         -0.0045      30462857      30326576      30462539      30325958
[simdsort/random_10m/ vs. simdsort/random_10m/]double                  -0.0078         -0.0079      63941783      63443562      63939974      63432449
[simdsort/random_10m/ vs. simdsort/random_10m/]_Float16                -0.1168         -0.1168      77634297      68570359      77623152      68554958
OVERALL_GEOMEAN                                                        -0.2814         -0.2815             0             0             0             0

Copy link
Contributor

@r-devulap r-devulap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for fixing it!

@r-devulap r-devulap merged commit 9fd995b into intel:main Mar 27, 2025
11 checks passed
r-devulap added a commit to r-devulap/numpy that referenced this pull request Apr 1, 2025
Pulls in 2 major changes:

(1) Fixes a performance regression on 16-bit dtype sorting (see
intel/x86-simd-sort#190)

(2) Adds openmp support for quicksort which speeds up sorting arrays >
100,000 by up to 3x. See: intel/x86-simd-sort#179
r-devulap added a commit to r-devulap/numpy that referenced this pull request Apr 15, 2025
Pulls in 2 major changes:

(1) Fixes a performance regression on 16-bit dtype sorting (see
intel/x86-simd-sort#190)

(2) Adds openmp support for quicksort which speeds up sorting arrays >
100,000 by up to 3x. See: intel/x86-simd-sort#179
r-devulap added a commit to r-devulap/numpy that referenced this pull request May 1, 2025
Pulls in 2 major changes:

(1) Fixes a performance regression on 16-bit dtype sorting (see
intel/x86-simd-sort#190)

(2) Adds openmp support for quicksort which speeds up sorting arrays >
100,000 by up to 3x. See: intel/x86-simd-sort#179
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.