Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

sevagh/basicpitch.cpp

Open more actions menu

Repository files navigation

basicpitch.cpp

C++20 inference for the Spotify basic-pitch automatic music transcription/MIDI generator neural network with ONNXRuntime, Eigen, and libremidi. Demo apps are provided for WebAssembly/Emscripten and a cli app.

I use ONNXRuntime and scripts from the excellent ort-builder project to implement the neural network inference like so:

  • Convert the ONNX model to ORT (onnxruntime)
  • Include only the operations and types needed for the specific neural network, cutting down code size
  • Compile the model weights to a .c and .h file to include it in the built binaries

After the neural network inference, I use libremidi to replicate the end-to-end MIDI file creation of the real basic-pitch project. I didn't run any official measurements but the WASM demo site is much faster than Spotify's own web demo.

Project design

  • ort-model contains the model in ONNX form, ORT form, and the generated h and c file
  • scripts contain the ORT model build scripts
  • src is the shared inference and MIDI creation code
  • src_wasm is the main WASM function, used in the web demo
  • src_cli is a Linux cli app (for debugging purposes) that uses libnyquist to load the audio files
  • vendor contains third-party/vendored libraries
  • web contains basic HTML/Javascript code to host the WASM demo

Usage

I recommend the tool midicsv for inspecting MIDI events in CSV format without more complicated MIDI software, to compare the files output by basicpitch.cpp to the real basic-pitch.

Python

To run Spotify's own inference code and the original Python inference code with ONNX, use the included inference script:

$ python scripts/python_inference.py --dest-dir ./midi-out-python/ ~/Downloads/clip.wav
...
Using model: /home/sevagh/repos/basicpitch.cpp/ort-model/model.onnx
Writing MIDI outputs to ./midi-out-python/

Predicting MIDI for /home/sevagh/Downloads/clip.wav...
...

CLI app

After following the build instructions below:

$ ./build/build-cli/basicpitch ~/Downloads/clip.wav ./midi-out-cpp
basicpitch.cpp Main driver program
Predicting MIDI for: /home/sevagh/Downloads/clip.wav
Input samples: 441000
Length in seconds: 10
Number of channels: 2
Resampling from 44100 Hz to 22050 Hz
output_to_notes_polyphonic
note_events_to_midi
Before iterating over note events
After iterating over note events
Now creating instrument track
done!
MIDI data size: 889
Wrote MIDI file to: "./midi-out-cpp/clip.mid"

WebAssembly/web demo

For web testing, serve the web static contents with the Python HTTP server:

$ cd web && python -m http.server 8000

Use the website:

web-screenshot

Build instructions

(Only tested on Linux, Pop!_OS 22.04). I'm assuming you have a typical C/C++ toolchain e.g. make, cmake, gcc/g++, for your OS. You also need to set up the Emscripten SDK for compiling to WebAssembly.

Clone the repo with submodules:

$ git clone --recurse-submodules https://github.com/sevagh/demucs.cpp

Create a Python venv (or conda env) and install the requirements:

$ pip install -r ./scripts/requirements.txt

Activate your venv and run the ONNXRuntime builder scripts:

$ activate my-env
$ ./scripts/build-ort-linux.sh
$ ./scripts/build-ort-wasm.sh

Check the outputs:

$ ls build/build-ort-*/MinSizeRel/libonnx.a
build/build-ort-linux/MinSizeRel/libonnx.a  build/build-ort-wasm/MinSizeRel/libonnx.a

Optional: if you want to re-convert the ONNX model to ORT in the ort-model directory, use scripts/convert-model-to-ort.sh ./ort-models/model.onnx. The ONNX model is copied from ./vendor/basic-pitch/basic_pitch/saved_models/icassp_2022/nmp.onnx

Build cli app:

$ make cli
$ ls build/build-cli/basicpitch
build/build-cli/basicpitch

For WebAssembly, first, set up the Emscripten SDK. Then, build the WASM app with your EMSDK env script:

$ export EMSDK_ENV_PATH=/path/to/emsdk/emsdk_env.sh
$ make wasm
$ ls build/build-wasm/basicpitch.wasm
build/build-wasm/basicpitch.wasm

This also copies the updated basicpitch.{wasm,js} to the ./web directory.

About

C++20 inference for Spotify's basic-pitch AMT/MIDI generator model with ONNXRuntime and libremidi

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
Morty Proxy This is a proxified and sanitized view of the page, visit original site.