Situat3DChange is a 3D visual-language benchmark designed to assess multimodal large language models (MLLMs) on real-world change understanding tasks, including change description, rearrangement planning, and question answering, all with situation awareness.
- 📂 Dataset on Hugging Face: lrp123/Situat3DChange
- 🤖 Baseline model: SCReasoner
- 📊 Evaluation tools: for both traditional NLP metrics and GPT-based evaluation
We recommend setting up the environment by following the steps in embodied-generalist, as SCReasoner builds on similar infrastructure.
Clone the repo:
git clone https://github.com/RuipingL/Situat3DChange.git
cd Situat3DChange
- Download Checkpoints
Download checkpoints.zip
from the Hugging Face dataset page, and extract it into:
Situat3DChange/SCReasoner/
- Launch Training
Use the following command to train SCReasoner with SLURM and Submitit:
python launch.py \
--mode submitit \
--config configs/default.yaml \
--name default \
--time 48 \
--num_nodes 1 \
--partition accelerated \
--gpu_per_node 4 \
--mem_per_gpu 100 \
--port 2050
Run:
python eval_qa/eval.py
For traditional metrics (BLEU-4, ROUGE, CIDEr, METEOR, BERTScore):
python eval_longform/eval.py
For GPT-based evaluation:
python eval_longform/eval_gpt.py
Results for SCReasoner including GPT scores are stored in:
results/SCReasoner/
If you use this project or dataset, please cite us (citation coming soon).
We thank the LEO project, upon which our project is based.