MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching

This repository provides the official implementation of our paper:

MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching
Accepted to IEEE Transactions on Image Processing (TIP), 2025
[ArXiv Paper]

📌 Introduction

Multimodal image matching is challenged by large appearance and texture discrepancies across modalities. MIFNet addresses this issue by learning modality-invariant features that generalize well across unseen domains. It combines low-level geometric features with high-level semantic guidance derived from a pretrained Stable Diffusion model. A lightweight GNN is further used to perform semantic-aware feature aggregation.

Key contributions:

Introduces semantic features from Stable Diffusion for multimodal matching.
Proposes a cross-modal hybrid aggregation network with a GNN backbone.
Demonstrates strong generalization on various unseen multimodal datasets.

🔧 Installation

conda create -n mifnet python=3.10
conda activate mifnet
pip install -r requirements.txt

You can download our released pretrained MIFNet models from the following link:

👉 Pretrained Checkpoints (drop box)
Please place the downloaded files under the checkpoints/ directory.

You also need to download the pretrained Stable Diffusion v2.1 model. Use the following command to download it from Hugging Face:

from huggingface_hub import snapshot_download

snapshot_download(repo_id="stabilityai/stable-diffusion-2-1", local_dir="./stable-diffusion-2-1/")

After downloading, move the entire folder to:

diffusion_weight/
└── stable-diffusion-2-1/

🧪 Inference

You can run testing using the provided script. The output matching visualization will be saved in output_images/.

cd scripts
python test_xfeat_mifnet.py --mode cf-fa       # cf-fa, cf-oct, ema-octa, opt-sar, opt-nir

Example output:

🏋️‍♂️ Training

Please organize your training data under the data/ directory with the following structure:

data/
└── retina/
    ├── Auxilliary_Training/      # Contains image pairs for auxiliary training
    └── retina_aux.txt            # List of training image pairs

Auxilliary_Training/ contains the actual training image files. You can download from Retina Dataset.
retina_aux.txt contains the file list (image pair paths) used during training， example : Auxilliary_Training/1184.png.

To train MIFNet, first navigate to the scripts directory and execute the training script:

cd scripts
sh train.sh

This will start the training process and generate an output/ directory to store model checkpoints and training logs.

📖 Citation

If you find this work useful, please consider citing our paper:

@article{liu2025mifnet,
  title     = {MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching},
  author    = {Liu, Yepeng and Sun, Zhichao and Yu, Baosheng and Zhao, Yitian and Du, Bo and Xu, Yongchao and Cheng, Jun},
  journal   = {IEEE Transactions on Image Processing},
  volume    = {34},
  pages     = {3593--3608},
  year      = {2025},
  doi       = {10.1109/TIP.2025.3574937}
}

🙏 Acknowledgments

We thank the following open-source projects that inspired and supported our work:

LightGlue: lightweight attention-based matcher for local features.
DIFT: semantic feature extraction using Stable Diffusion.

Their contributions significantly accelerated the development of MIFNet.

📬 Contact

For questions or collaboration, feel free to contact: Yepeng Liu

📘 License

This project is licensed under the MIT License.

Name	Name	Last commit message	Last commit date
Latest commit History 6 Commits 6 Commits
assert	assert
checkpoints	checkpoints
configs	configs
demo	demo
example_imgs	example_imgs
models	models
scripts	scripts
third_party/dift	third_party/dift
training	training
utils	utils
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md
requirements.txt	requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching

📌 Introduction

🔧 Installation

🧪 Inference

🏋️‍♂️ Training

📖 Citation

🙏 Acknowledgments

📬 Contact

📘 License

About

Uh oh!

Releases

Packages

Languages

Search code, repositories, users, issues, pull requests...

License

lyp-deeplearning/MIFNet

Folders and files

Latest commit

History

Repository files navigation

MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching

📌 Introduction

🔧 Installation

🧪 Inference

🏋️‍♂️ Training

📖 Citation

🙏 Acknowledgments

📬 Contact

📘 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages