Long-Range-Grouping-Transformer

Official PyTorch implementation of the paper:

Long-Range Grouping Transformer for Multi-View 3D Reconstruction

Authors: Liying Yang, Zhenwei Zhu, Xuxin Lin, Jian Nong, Yanyan Liang.

News!!

We share the pre-trained model of LRGT, which has been trained under 3-view input. Download link Extraction code: ynq7. Unlike the model in paper, we train the model using two NVIDIA 4090 GPUs. The performance is similar to the paper presented.

Performance

Methods	1 view	2 views	3 views	4 views	5 views	8 views	12 views	16 views	20 views
3D-R2N2	0.560 / 0.351	0.603 / 0.368	0.617 / 0.372	0.625 / 0.378	0.634 / 0.382	0.635 / 0.383	0.636 / 0.382	0.636 / 0.382	0.636 / 0.383
AttSets	0.642 / 0.395	0.662 / 0.418	0.670 / 0.426	0.675 / 0.430	0.677 / 0.432	0.685 / 0.444	0.688 / 0.445	0.692 / 0.447	0.693 / 0.448
Pix2Vox++	0.670 / 0.436	0.695 / 0.452	0.704 / 0.455	0.708 / 0.457	0.711 / 0.458	0.715 / 0.459	0.717 / 0.460	0.718 / 0.461	0.719 / 0.462
GARNet	0.673 / 0.418	0.705 / 0.455	0.716 / 0.468	0.722 / 0.475	0.726 / 0.479	0.731 / 0.486	0.734 / 0.489	0.736 / 0.491	0.737 / 0.492
GARNet+	0.655 / 0.399	0.696 / 0.446	0.712 / 0.465	0.719 / 0.475	0.725 / 0.481	0.733 / 0.491	0.737 / 0.498	0.740 / 0.501	0.742 / 0.504
EVolT	- / -	- / -	- / -	0.609 / 0.358	- / -	0.698 / 0.448	0.720 / 0.475	0.729 / 0.486	0.735 / 0.492
LegoFormer	0.519 / 0.282	0.644 / 0.392	0.679 / 0.428	0.694 / 0.444	0.703 / 0.453	0.713 / 0.464	0.717 / 0.470	0.719 / 0.472	0.721 / 0.472
3D-C2FT	0.629 / 0.371	0.678 / 0.424	0.695 / 0.443	0.702 / 0.452	0.702 / 0.458	0.716 / 0.468	0.720 / 0.475	0.723 / 0.477	0.724 / 0.479
3D-RETR (3 view)	0.674 / -	0.707 / -	0.716 / -	0.720 / -	0.723 / -	0.727 / -	0.729 / -	0.730 / -	0.731 / -
3D-RETR*	0.680 / -	0.701 / -	0.716 / -	0.725 / -	0.736 / -	0.739 / -	0.747 / -	0.755 / -	0.757 / -
UMIFormer	0.6802 / 0.4281	0.7384 / 0.4919	0.7518 / 0.5067	0.7573 / 0.5127	0.7612 / 0.5168	0.7661 / 0.5213	0.7682 / 0.5232	0.7696 / 0.5245	0.7702 / 0.5251
UMIFormer+	0.5672 / 0.3177	0.7115 / 0.4568	0.7447 / 0.4947	0.7588 / 0.5104	0.7681 / 0.5216	0.7790 / 0.5348	0.7843 / 0.5415	0.7873 / 0.5451	0.7886 / 0.5466
LRGT (Ours)	0.6962 / 0.4461	0.7462 / 0.5005	0.7590 / 0.5148	0.7653 / 0.5214	0.7692 / 0.5257	0.7744 / 0.5311	0.7766 / 0.5337	0.7781 / 0.5347	0.7786 / 0.5353
LRGT+ (Ours)	0.5847 / 0.3378	0.7145 / 0.4618	0.7476 / 0.4989	0.7625 / 0.5161	0.7719 / 0.5271	0.7833 / 0.5403	0.7888 / 0.5467	0.7912 / 0.5497	0.7922 / 0.5510

* The results in this row are derived from models that train individually for the various number of input views.

TODO

The code and pretrain models are coming soon.

Release the pretrain models
Release the code

Installation

The environment was tested on Ubuntu 16.04.5 LTS and Ubuntu 20.04.5 LTS. We trained LRGT on 2 Tesla V100s for about 1 day and LRGT+ on 8 Tesla V100s for about 2.5 days.

Clone the code repository

git clone https://github.com/LiyingCV/Long-Range-Grouping-Transformer.git

Create a new environment from environment.yml

conda env create -f environment.yml
conda activate lrgt

Or install Python dependencies

cd Long-Range-Grouping-Transformer
conda create -n lrgt python=3.6
pip install -r requirements.txt

Demo

Datasets

We use the ShapeNet and Pix3D in our experiments, which are available below:

ShapeNet rendering images: http://cvgl.stanford.edu/data2/ShapeNetRendering.tgz
ShapeNet voxelized models: http://cvgl.stanford.edu/data2/ShapeNetVox32.tgz
Pix3D images & voxelized models: http://pix3d.csail.mit.edu/data/pix3d.zip

Get start

Training

We provide the training script, which you can run as following: sh train.sh.

We use torch.distributed for multiple GPU training; therefore, you can change CUDA_VISIBLE_DEVICES and nproc_per_node to use more devices or only one device.

Evaluation

We provide the testing script, which you can run as following: sh test.sh

Citation

If you find our code or paper useful in your research, please consider citing:

@InProceedings{Yang_2023_ICCV,
    author    = {Yang, Liying and Zhu, Zhenwei and Lin, Xuxin and Nong, Jian and Liang, Yanyan},
    title     = {Long-Range Grouping Transformer for Multi-View 3D Reconstruction},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {18257-18267}
}

Futher Information

Please check out other works on multi-view reconstruction from our group:

Name	Name	Last commit message	Last commit date
Latest commit History 76 Commits
core	core
datasets	datasets
imgs	imgs
losses	losses
models	models
utils	utils
LICENSE	LICENSE
README.md	README.md
config.py	config.py
environment.yml	environment.yml
requirements.txt	requirements.txt
runner.py	runner.py
test.sh	test.sh
train.sh	train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Long-Range-Grouping-Transformer

News!!

Performance

* The results in this row are derived from models that train individually for the various number of input views.

TODO

Installation

Demo

Datasets

Get start

Training

Evaluation

Citation

Futher Information

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Search code, repositories, users, issues, pull requests...

License

LiyingCV/Long-Range-Grouping-Transformer

Folders and files

Latest commit

History

Repository files navigation

Long-Range-Grouping-Transformer

News!!

Performance

* The results in this row are derived from models that train individually for the various number of input views.

TODO

Installation

Demo

Datasets

Get start

Training

Evaluation

Citation

Futher Information

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages