Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Code for "MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching", TIP2025

License

Notifications You must be signed in to change notification settings

lyp-deeplearning/MIFNet

Open more actions menu

Repository files navigation

MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching

This repository provides the official implementation of our paper:

MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching
Accepted to IEEE Transactions on Image Processing (TIP), 2025
[ArXiv Paper]


📌 Introduction

Multimodal image matching is challenged by large appearance and texture discrepancies across modalities. MIFNet addresses this issue by learning modality-invariant features that generalize well across unseen domains. It combines low-level geometric features with high-level semantic guidance derived from a pretrained Stable Diffusion model. A lightweight GNN is further used to perform semantic-aware feature aggregation.

Key contributions:

  • Introduces semantic features from Stable Diffusion for multimodal matching.
  • Proposes a cross-modal hybrid aggregation network with a GNN backbone.
  • Demonstrates strong generalization on various unseen multimodal datasets.

Framework Overview


🔧 Installation

conda create -n mifnet python=3.10
conda activate mifnet
pip install -r requirements.txt

You can download our released pretrained MIFNet models from the following link:

👉 Pretrained Checkpoints (drop box)
Please place the downloaded files under the checkpoints/ directory.

You also need to download the pretrained Stable Diffusion v2.1 model. Use the following command to download it from Hugging Face:

from huggingface_hub import snapshot_download

snapshot_download(repo_id="stabilityai/stable-diffusion-2-1", local_dir="./stable-diffusion-2-1/")

After downloading, move the entire folder to:

diffusion_weight/
└── stable-diffusion-2-1/

🧪 Inference

You can run testing using the provided script. The output matching visualization will be saved in output_images/.

cd scripts
python test_xfeat_mifnet.py --mode cf-fa       # cf-fa, cf-oct, ema-octa, opt-sar, opt-nir

Example output:

Matching Result


🏋️‍♂️ Training

Please organize your training data under the data/ directory with the following structure:

data/
└── retina/
    ├── Auxilliary_Training/      # Contains image pairs for auxiliary training
    └── retina_aux.txt            # List of training image pairs
  • Auxilliary_Training/ contains the actual training image files. You can download from Retina Dataset.
  • retina_aux.txt contains the file list (image pair paths) used during training, example : Auxilliary_Training/1184.png.

To train MIFNet, first navigate to the scripts directory and execute the training script:

cd scripts
sh train.sh

This will start the training process and generate an output/ directory to store model checkpoints and training logs.


📖 Citation

If you find this work useful, please consider citing our paper:

@article{liu2025mifnet,
  title     = {MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching},
  author    = {Liu, Yepeng and Sun, Zhichao and Yu, Baosheng and Zhao, Yitian and Du, Bo and Xu, Yongchao and Cheng, Jun},
  journal   = {IEEE Transactions on Image Processing},
  volume    = {34},
  pages     = {3593--3608},
  year      = {2025},
  doi       = {10.1109/TIP.2025.3574937}
}

🙏 Acknowledgments

We thank the following open-source projects that inspired and supported our work:

  • LightGlue: lightweight attention-based matcher for local features.
  • DIFT: semantic feature extraction using Stable Diffusion.

Their contributions significantly accelerated the development of MIFNet.


📬 Contact

For questions or collaboration, feel free to contact: Yepeng Liu


📘 License

This project is licensed under the MIT License.

About

Code for "MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching", TIP2025

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

Morty Proxy This is a proxified and sanitized view of the page, visit original site.