GitHub - codecaution/EvoMoE

EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate

This code is for paper "EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate" [PDF] (Under Review), which is implemented based on Fairseq.

Dependencies and Installation:

Please install the following packages at first.

# Apex
pip install -v --no-cache-dir --global-option="--cpp_ext" \
    --global-option="--cuda_ext" --global-option="--deprecated_fused_adam" \
    --global-option="--xentropy" --global-option="--fast_multihead_attn" \
    git+git://github.com/NVIDIA/apex.git@e2083df5eb96643c61613b9df48dd4eea6b07690

# FairScale  
pip install fairscale==0.4.0

# Hydra
pip install hydra-core==1.0.7 omegaconf==2.0.6

#For large datasets, please install PyArrow
pip install pyarrow

Follow the fairseq installation instructions as original repo. To install EvoMoE and develop locally:

git clone https://github.com/pytorch/EvoMoE
cd fairseq
pip install --editable ./

# on MacOS:
# CFLAGS="-stdlib=libc++" pip install --editable ./

# to install the latest stable release (0.10.x)
# pip install fairseq

Training

Prepare the training data by following the official examplelink.
Train a Dense-to-Sparse MoE model as:

GPU_PER_NODE_COUNT=$DLWS_NUM_GPU_PER_WORKER
MASTER_ADDR=$MASTER_IP
MASTER_PORT=$MASTER_PORT

if [[ -z "${OMPI_COMM_WORLD_SIZE}" ]]
then
  ddp_options="--distributed-world-size $GPU_PER_NODE_COUNT --distributed-rank 0 --distributed-init-method tcp://${MASTER_ADDR}:${MASTER_PORT}"
else
  if (( $OMPI_COMM_WORLD_SIZE == 1))
  then
	ddp_options="--distributed-world-size $GPU_PER_NODE_COUNT --distributed-rank 0 --distributed-init-method tcp://${MASTER_ADDR}:${MASTER_PORT}"
  else
    local_rank=$(($GPU_PER_NODE_COUNT*$OMPI_COMM_WORLD_RANK))
    total_gpu=$(($OMPI_COMM_WORLD_SIZE*$GPU_PER_NODE_COUNT))
    ddp_options="--distributed-world-size $total_gpu --distributed-rank $local_rank --distributed-init-method tcp://${MASTER_ADDR}:${MASTER_PORT}"
  fi
fi
echo "ddp_options: ${ddp_options}"

python train.py ${ddp_options} \
      ${DATA_PATH} \
      --task language_modeling \
      --arch transformer_lm_gptxl_moe \
      --share-decoder-input-output-embed \
      --tokens-per-sample ${TOKENS_PER_SAMPLE} --batch-size ${BATCH_SIZE} --update-freq ${UPDATE_FREQ} \
      --lr ${LR} --lr-scheduler polynomial_decay --warmup-updates ${WARMUP_STEPS}  \
      --optimizer adam --adam-betas '(0.9, 0.98)' --adam-eps 1e-08 \
      --clip-norm 5.0 --weight-decay 0.1 --dropout 0.1 \
      --criterion cross_entropy \
      --write-checkpoints-asynchronously \
      --save-dir ${CHECKPOINT_PATH} \
      --save-interval-updates ${CHECKPOINT_FREQUENCY} \
      --num-workers ${DLWS_NUM_WORKER}\
      --ddp-backend fully_sharded \
      --checkpoint-activations \
      --max-update ${MAX_UPDATES} \
      --total-num-update ${MAX_UPDATES} \
      --validate-interval-updates ${CHECKPOINT_FREQUENCY} \
      --log-format json --log-interval 100 \
      --symlink \
      --seed 1234 2>&1 | tee -a $LOG_PATH

Citation

Please cite as:

@article{nie2021densetosparse,
  title = {EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate},
  author = {Nie, Xiaonan and Cao, Shijie and Miao, Xupeng and Ma, Lingxiao and Xue, Jilong and Miao, Youshan and Yang, Zichao and Yang, Zhi and Cui, Bin},
  journal = {arXiv preprint arXiv:2112.14397},
  year = {2021},
}

Name	Name	Last commit message	Last commit date
Latest commit History 1 Commit 1 Commit
.github	.github
docs	docs
examples	examples
fairseq	fairseq
fairseq_cli	fairseq_cli
.gitignore	.gitignore
.gitmodules	.gitmodules
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md
CONTRIBUTING.md	CONTRIBUTING.md
LICENSE	LICENSE
README.md	README.md
hubconf.py	hubconf.py
pyproject.toml	pyproject.toml
setup.py	setup.py
train.py	train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate

Dependencies and Installation:

Training

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Search code, repositories, users, issues, pull requests...

Folders and files

Latest commit

History

Repository files navigation

EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate

Dependencies and Installation:

Training

Citation

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages