The MultiCaRe dataset is an open-source clinical case dataset for medical image classification and multimodal AI applications, built from PubMed Central open-access case reports.
- 98,000+ de-identified clinical cases from 72,000+ PubMed Central case reports
- 139,000+ medical images across multiple specialties
- Covers diverse domains including oncology, cardiology, pathology, surgery, and more
- Image taxonomy with 140+ classes organized in a hierarchical structure with logical constraints (mutual exclusivity, subsumption, etc.)
- Fully open-source under CC0 license
The dataset contains the following data elements:
The multiversity Python library lets you create customized subsets of MultiCaRe based on filters like patient demographics, clinical keywords, image labels, and more.
pip install multiversityfrom multiversity.multicare_dataset import MedicalDatasetCreator
# Load the dataset (downloads from Zenodo, takes 5–10 min)
mdc = MedicalDatasetCreator(directory='medical_datasets')
# Define filters
filters = [
{'field': 'min_age', 'string_list': ['18']},
{'field': 'gender', 'string_list': ['Male']},
{'field': 'case_strings', 'string_list': ['tumor', 'cancer', 'carcinoma'], 'operator': 'any'},
{'field': 'caption', 'string_list': ['metastasis', 'tumor', 'mass'], 'operator': 'any'},
{'field': 'label', 'string_list': ['mri', 'head']}
]
# Create dataset (multimodal, text, image, or case_series)
mdc.create_dataset(
dataset_name='male_brain_tumor_dataset',
filter_list=filters,
dataset_type='multimodal'
)➡️ For full library documentation, visit the multiversity repository.
mdc.display_example()This will render a sample case with its clinical narrative, image, image labels, and citation metadata.
| Folder | Description |
|---|---|
Dataset_Creation_Process/ |
Notebooks detailing how the dataset was built |
Demos/ |
Example notebooks for creating subsets and classification datasets |
MultiCaReClassifier/ |
Classification model trained on MultiCaRe |
MultiCaRe_Taxonomy/ |
The full image taxonomy (140+ classes) |
- 📄 Data Article (MDPI Data) — Full description of the dataset
- 🗄️ Dataset on Zenodo — Download the data
- 📓 Subset creation demo
- 🖼️ Image classification demo
- 🏷️ MultiCaRe Taxonomy
If you need to work with MultiCaRe v1.0:
from multiversity.multicare_v1 import *If you use MultiCaRe in your work, please cite:
Data Article:
Nievas Offidani, M., Roffet, F., González Galtier, M. C., Massiris, M., & Delrieux, C. (2025).
An Open-Source Clinical Case Dataset for Medical Image Classification and Multimodal AI Applications.
Data, 10(8), 123. https://doi.org/10.3390/data10080123Dataset (Zenodo v3):
Nievas Offidani, M. (2025). MultiCaRe: An open-source clinical case dataset for medical image
classification and multimodal AI applications (version 3) [Data set].
Zenodo. https://doi.org/10.5281/zenodo.10079369Contributions are welcome! Feel free to open issues or submit pull requests.
If you find this project useful, please consider giving it a ⭐ — it helps a lot with visibility.
For questions or collaborations, reach out on LinkedIn.