vxsort is a fast, somewhat novel, hybrid, vectorized quicksort+bitonic primitive sorter implemented in C++.
The name vxsort stands for vectorized 10x sort.
It currently supports the following combination of vector ISA and primitive types:
| i64 | i32 | i16 | u64 | u32 | u16 | f64 | f32 | f16 | |
|---|---|---|---|---|---|---|---|---|---|
| AVX2 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
| AVX512 | ✅ | ✅ | ✅1 | ✅ | ✅ | ✅1 | ✅ | ✅ | ❌ |
| ARM-Neon | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| ARM-SVE2 | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| RiscV-V 1.0 | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
1 - Requires AVX512/VBMI2 support which is available for all Intel AVX512 CPUs post Icelake, AMD post Zen4
mkdir build
cd build
# For better code-gen
export CC=clang CXX=clang++
cmake .. -G Ninja
ninja
To run tests, use ctest, preferrably with -J $(nproc) to avoid waiting for a long time:
ctest -J $(nproc)
Tests are built into 3 executables (signed integers, unsigned integer, floating-point) per supported vector ISA. This allows for easy to expolit parallelization both whne building and executing the tests.
- Plug in to a power-source if on laptop
- Try to kill CPU hogs -or-
- Alternatively, rely on the supplied
run.shtokill -STOP / -CONTwell known CPU hogs
# This script will kill -STOP a bunch of CPU hogs like chrome,
# Then executes
# ./bench/vxsort_bench --benchmark_counters_tabular
# and resume them after it's done
./bench/run.sh