distributed

Distributed Training on MNIST using PyTorch C++ Frontend (Libtorch)

This folder contains an example of data-parallel training of a convolutional neural network on the MNIST dataset. For parallelization, Message Passing Interface (MPI) is used.

The entire code is contained in dist-mnist.cpp

You can find instructions on how to install MPI [here] (https://www.open-mpi.org/faq/?category=building). This code was tested on Open MPI but it should run on other MPI distributions as well such as MPICH, MVAPICH, etc.

To build the code, run the following commands from the terminal:

$ cd distributed
$ mkdir build
$ cd build
$ cmake -DCMAKE_PREFIX_PATH=/path/to/libtorch ..
$ make

where /path/to/libtorch should be the path to the unzipped LibTorch distribution. Note that the LibTorch from the [PyTorch homepage] ((https://pytorch.org/get-started/locally/) does not include MPI headers and cannot be used for this example. You have to compile LibTorch manually - a set of guidelines is provided [here] (https://gist.github.com/lasagnaphil/3e0099816837318e8e8bcab7edcfd5d9), however this may vary for different systems.

To run the code,

mpirun -np {NUM-PROCS} ./dist-mnist

Name	Name	Last commit message	Last commit date
parent directory ..
CMakeLists.txt	CMakeLists.txt
README.md	README.md
dist-mnist.cpp	dist-mnist.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand file tree

README.md

Distributed Training on MNIST using PyTorch C++ Frontend (Libtorch)

Search code, repositories, users, issues, pull requests...

FilesExpand file tree

distributed

Directory actions

More options

Directory actions

More options

Latest commit

History

distributed

Folders and files

parent directory

README.md

Distributed Training on MNIST using PyTorch C++ Frontend (Libtorch)

Expand file tree