Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

mauro-nievoff/MultiCaRe_Dataset

Open more actions menu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

142 Commits
142 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🏥 MultiCaRe — A Multimodal Clinical Case Dataset

The MultiCaRe dataset is an open-source clinical case dataset for medical image classification and multimodal AI applications, built from PubMed Central open-access case reports.

📊 Key Facts

  • 98,000+ de-identified clinical cases from 72,000+ PubMed Central case reports
  • 139,000+ medical images across multiple specialties
  • Covers diverse domains including oncology, cardiology, pathology, surgery, and more
  • Image taxonomy with 140+ classes organized in a hierarchical structure with logical constraints (mutual exclusivity, subsumption, etc.)
  • Fully open-source under CC0 license

🗂️ Dataset Structure

The dataset contains the following data elements:

MultiCaRe data elements


✅ Create Your Own Custom Subset

The multiversity Python library lets you create customized subsets of MultiCaRe based on filters like patient demographics, clinical keywords, image labels, and more.

Installation

pip install multiversity

Basic Usage

from multiversity.multicare_dataset import MedicalDatasetCreator

# Load the dataset (downloads from Zenodo, takes 5–10 min)
mdc = MedicalDatasetCreator(directory='medical_datasets')

# Define filters
filters = [
    {'field': 'min_age', 'string_list': ['18']},
    {'field': 'gender', 'string_list': ['Male']},
    {'field': 'case_strings', 'string_list': ['tumor', 'cancer', 'carcinoma'], 'operator': 'any'},
    {'field': 'caption', 'string_list': ['metastasis', 'tumor', 'mass'], 'operator': 'any'},
    {'field': 'label', 'string_list': ['mri', 'head']}
]

# Create dataset (multimodal, text, image, or case_series)
mdc.create_dataset(
    dataset_name='male_brain_tumor_dataset',
    filter_list=filters,
    dataset_type='multimodal'
)

➡️ For full library documentation, visit the multiversity repository.


🔍 Exploring the Data

mdc.display_example()

This will render a sample case with its clinical narrative, image, image labels, and citation metadata.


📁 Repository Contents

Folder Description
Dataset_Creation_Process/ Notebooks detailing how the dataset was built
Demos/ Example notebooks for creating subsets and classification datasets
MultiCaReClassifier/ Classification model trained on MultiCaRe
MultiCaRe_Taxonomy/ The full image taxonomy (140+ classes)

💡 Useful Resources

  1. 📄 Data Article (MDPI Data) — Full description of the dataset
  2. 🗄️ Dataset on Zenodo — Download the data
  3. 📓 Subset creation demo
  4. 🖼️ Image classification demo
  5. 🏷️ MultiCaRe Taxonomy

📦 Legacy Code

If you need to work with MultiCaRe v1.0:

from multiversity.multicare_v1 import *

🤓 How to Cite

If you use MultiCaRe in your work, please cite:

Data Article:

Nievas Offidani, M., Roffet, F., González Galtier, M. C., Massiris, M., & Delrieux, C. (2025).
An Open-Source Clinical Case Dataset for Medical Image Classification and Multimodal AI Applications.
Data, 10(8), 123. https://doi.org/10.3390/data10080123

Dataset (Zenodo v3):

Nievas Offidani, M. (2025). MultiCaRe: An open-source clinical case dataset for medical image
classification and multimodal AI applications (version 3) [Data set].
Zenodo. https://doi.org/10.5281/zenodo.10079369

🤝 Contributing

Contributions are welcome! Feel free to open issues or submit pull requests.

If you find this project useful, please consider giving it a ⭐ — it helps a lot with visibility.

For questions or collaborations, reach out on LinkedIn.

About

Open-source multimodal dataset: 98K+ clinical cases & 139K+ medical images from PubMed Central

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Morty Proxy This is a proxified and sanitized view of the page, visit original site.