GitHub - csuhan/OneLLM: [CVPR 2024] OneLLM: One Framework to Align All Modalities with Language

OneLLM: One Framework to Align All Modalities with Language

[Project Page] [Paper] [HF Demo🤗] [Modelscope Demo🤖] [Model🤗] [Data]

News

2024.02.27 OneLLM is accepted by CVPR 2024!🎉
2023.12.01 Release model weights and inference code.

torchrun --nproc_per_node=8 main_pretrain.py \
--epochs 1 --dataset image \
--batch_size 40 --accum_iter 16 \
--model_parallel_size 1 \
--data_parallel sdp \
--save_consolidated \
--llama_type onellm \
--llama_ckpt_dir ${LLAMA_7B_PATH} \
--llama_config config/llama2/7B.json \
--tokenizer_path config/llama2/tokenizer.model \
--auto_resume \
--weight_decay 0.1 --output_dir ${OUTPUT_DIR} \
--warmup_iters 2000 --lr_decay_iters 200000 --lr 5e-5 --min_lr 5e-6 --clip_grad 2 \
--save_freq 1000 \
2>&1 | tee -a ${OUTPUT_DIR}/output.log

Multi Nodes DDP Training:

Run N scripts on N nodes at the time, then we can launch a multi-node DDP training. Following is an example script for one node:

MASTER_ADDR=IP_ADDRESS_OF_NODE_1
NNODES=N
MASTER_PORT=29500
NPROC_PER_NODE=8

RANK=0 or 1 or ... or N

bash
torchrun \
--nnodes=$NNODES \
--nproc_per_node=8 \
--node_rank=$RANK \
--master_port=$MASTER_PORT \
--master_addr=$MASTER_ADDR \
main_pretrain.py \
--epochs 1 --dataset image \
--batch_size 40 --accum_iter 16 \
--model_parallel_size 1 \
--data_parallel sdp \
--save_consolidated \
--llama_type onellm \
--llama_ckpt_dir ${LLAMA_7B_PATH} \
--llama_config config/llama2/7B.json \
--tokenizer_path config/llama2/tokenizer.model \
--auto_resume \
--weight_decay 0.1 --output_dir ${OUTPUT_DIR} \
--warmup_iters 2000 --lr_decay_iters 200000 --lr 5e-5 --min_lr 5e-6 --clip_grad 2 \
--save_freq 1000 \
2>&1 | tee -a ${OUTPUT_DIR}/output.log

Multi Node SLURM Training: exps/image_text_pretrain_slurm.sh

#!/bin/bash
#SBATCH --gres=gpu:8
#SBATCH -n 16
#SBATCH -N 2
#SBATCH --cpus-per-task=16

srun python -u main_pretrain.py \
--epochs 1 --dataset image \
--batch_size 40 --accum_iter 8 \
--model_parallel_size 1 \
--data_parallel sdp \
--save_consolidated \
--llama_type onellm \
--llama_ckpt_dir ${LLAMA_7B_PATH} \
--llama_config config/llama2/7B.json \
--tokenizer_path config/llama2/tokenizer.model \
--auto_resume \
--weight_decay 0.1 --output_dir ${OUTPUT_DIR} \
--warmup_iters 2000 --lr_decay_iters 200000 --lr 5e-5 --min_lr 5e-6 --clip_grad 2 \
--save_freq 1000 \
2>&1 | tee -a ${OUTPUT_DIR}/output.log

Multimodal-Text Pretraining

Stage II Pretraining: Assume we have the pretrained ${IMAGE_TEXT_MODEL}, run exps/multimodal_text_pretrain_stage2.sh for video-audio-point-text pretraining.

Stage III Pretraining: Assume we have the pretrained ${STAGE2_MODEL}, run exps/multimodal_text_pretrain_stage3.sh for depth-normal-imu-fmri-text pretraining.

Instruction Tuning

Assume we have the pretrained ${STAGE3_MODEL}, run exps/multimodal_text_finetune.sh for multimodal instruction tuning.

Citation

@InProceedings{han2023onellm,
  title={OneLLM: One Framework to Align All Modalities with Language},
  author={Han, Jiaming and Gong, Kaixiong and Zhang, Yiyuan and Wang, Jiaqi and Zhang, Kaipeng and Lin, Dahua and Qiao, Yu and Gao, Peng and Yue, Xiangyu},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2024}
}

Acknowledgement

LLaMA, LLaMA-Adapter, LLaMA2-Accessory, Meta-Transformer, ChatBridge

License

This project is developed based on Llama 2, please refer to the LLAMA 2 Community License.

Name	Name	Last commit message	Last commit date
Latest commit History 24 Commits 24 Commits
config/llama2	config/llama2
data	data
demos	demos
docs	docs
eval	eval
exps	exps
model	model
util	util
.gitignore	.gitignore
LICENSE_llama2	LICENSE_llama2
OneLLM_Arxiv.pdf	OneLLM_Arxiv.pdf
README.md	README.md
engine_finetune.py	engine_finetune.py
engine_pretrain.py	engine_pretrain.py
main_finetune.py	main_finetune.py
main_pretrain.py	main_pretrain.py
requirements.txt	requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OneLLM: One Framework to Align All Modalities with Language

News

Contents

Install

Models

Demo

Data

Evaluation

Training

Image-Text Pretraining

Multimodal-Text Pretraining

Instruction Tuning

Citation

Acknowledgement

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Search code, repositories, users, issues, pull requests...

Folders and files

Latest commit

History

Repository files navigation

OneLLM: One Framework to Align All Modalities with Language

News

Contents

Install

Models

Demo

Data

Evaluation

Training

Image-Text Pretraining

Multimodal-Text Pretraining

Instruction Tuning

Citation

Acknowledgement

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages