Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

kljp/micro

Open more actions menu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MiCRO: Near-Zero Cost Gradient Sparsification for Scaling and Accelerating Distributed DNN Training

Description

MiCRO is abbreviation of 'Minimizing compression ratio error on-the-fly'. MiCRO sparsifies gradients in partitioned gradient vectors, and estimates accurate threshold for sparsification condition by minimizing compression ratio error. Benchmarks include: 1) image classification using CNNs, 2) language modelling using LSTMs cnn-lstm, and 3) recommendation using NCF ncf.

How to setup

To install the necessary dependencies, create Conda environment using environment.yml by running the following commands. Note: check compatibility of each package version such as cudatoolkit and cudnn with your device, e.g., NVIDIA Tesla V100 GPU is compatible.

$ conda env create --file environment.yml
$ conda activate micro
$ python -m spacy download en
$ conda deactivate micro

How to start

The scripts to run code are written for SLURM workload manager. The source code supports distributed training with multi-node and multi-GPU. In run.sh, you can specify model, dataset, reducer, and world_size.

Overview of shell scripts

  • If you use SLURM, use pararun and modify it for your configuration. The script pararun executes run.sh in parallel. The script run.sh includes setup for distributed training.
  • If you do not use SLURM, you do not need to use pararun. Instead, run run.sh on your nodes, then rendezvous of pytorch allows processes are connected.

CNN and LSTM benchmarks

  • If you use SLURM, use following command.
$ sbatch pararun
  • If you do not use SLURM, use following command on each node.
$ hostip=<ip> port=<port> mpirun -np <world_size> run.sh

Neural Collaborative Filtering (NCF) benchmarks

1. Prepare dataset

  • To download dataset, use following command.
$ ./prepare_dataset.sh

2. Run training script

  • If you use SLURM, use following command.
$ sbatch pararun
  • If you do not use SLURM, use following command on each node.
$ hostip=<ip> port=<port> mpirun -np <world_size> run.sh

Publication

If you use this code, please cite the following [Paper]:

  • MiCRO: Near-Zero Cost Gradient Sparsification for Scaling and Accelerating Distributed DNN Training. Daegun Yoon, Sangyoon Oh. HiPC 2023, Dec. 2023.
@article{yoon2023micro,
  title={MiCRO: Near-Zero Cost Gradient Sparsification for Scaling and Accelerating Distributed DNN Training},
  author={Yoon, Daegun and Oh, Sangyoon},
  journal={arXiv preprint arXiv:2310.00967},
  year={2023}
}

Contact

If you have any questions about this project, contact me by one of the followings:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
Morty Proxy This is a proxified and sanitized view of the page, visit original site.