Abstract
In pathology, the deployment of artificial intelligence (AI) in clinical settings is constrained by limitations in data collection and in model transparency and interpretability. Here we describe a digital pathology framework, nuclei.io, that incorporates active learning and human-in-the-loop real-time feedback for the rapid creation of diverse datasets and models. We validate the effectiveness of the framework via two crossover user studies that leveraged collaboration between the AI and the pathologist, including the identification of plasma cells in endometrial biopsies and the detection of colorectal cancer metastasis in lymph nodes. In both studies, nuclei.io yielded considerable diagnostic performance improvements. Collaboration between clinicians and AI will aid digital pathology by enhancing accuracies and efficiencies.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
The data supporting the results in this study are available within the paper and its Supplementary Information. The deidentified nuclei image patches, and pathologists’ annotations are available at https://huangzhii.github.io/nuclei-HAI. Source data are provided with this paper.
Code availability
The source code of nuclei.io is available at https://huangzhii.github.io/nuclei-HAI.
References
Kirillov, A. et al. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) 4015–4026 (IEEE, 2023).
Gamper, J., Alemi Koohbanani, N., Benet, K., Khuram, A. & Rajpoot, N. PanNuke: An open pan-cancer histology dataset for nuclei instance segmentation and classification. In Digital Pathology. ECDP 2019. Lecture Notes in Computer Science Vol. 11435 (eds Reyes-Aldasoro, C. C. et al.) 11–19 (Springer, 2019).
Kather, J. N. et al. Predicting survival from colorectal cancer histology slides using deep learning: a retrospective multicenter study. PLoS Med. 16, e1002730 (2019).
Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual–language foundation model for pathology image analysis using medical Twitter. Nat. Med. 29, 2307–2316 (2023).
Lu, M. Y. et al. A visual-language foundation model for computational pathology. Nat. Med. 30, 863–874 (2024).
Chen, R. J. et al. Towards a general-purpose foundation model for computational pathology. Nat. Med. 30, 850–862 (2024).
Amgad, M. et al. A population-level digital histologic biomarker for enhanced prognosis of invasive breast cancer. Nat. Med. 30, 85–97 (2024).
Jiang, X. et al. End-to-end prognostication in colorectal cancer by deep learning: a retrospective, multicentre study. Lancet Digit. Health 6, e33–e43 (2024).
Liu, Y. et al. Artificial intelligence-based breast cancer nodal metastasis detection: insights into the black box for pathologists. Arch. Pathol. Lab. Med. 143, 859–868 (2019).
Krogue, J. D. et al. Predicting lymph node metastasis from primary tumor histology and clinicopathologic factors in colorectal cancer using deep learning. Commun. Med. 3, 59 (2023).
Huang, Z. et al. Artificial intelligence reveals features associated with breast cancer neoadjuvant chemotherapy responses from multi-stain histopathologic images. npj Precis. Oncol. 7, 14 (2023).
Yamashita, R. et al. Deep learning model for the prediction of microsatellite instability in colorectal cancer: a diagnostic study. Lancet Oncol. 22, 132–141 (2021).
He, J. et al. The practical implementation of artificial intelligence technologies in medicine. Nat. Med. 25, 30–36 (2019).
Price, W. N. II, Gerke, S. & Cohen, I. G. Potential liability for physicians using artificial intelligence. JAMA 322, 1765–1766 (2019).
Acs, B., Rantalainen, M. & Hartman, J. Artificial intelligence as the next step towards precision pathology. J. Intern. Med. 288, 62–81 (2020).
Steiner, D. F. et al. Impact of deep learning assistance on the histopathologic review of lymph nodes for metastatic breast cancer. Am. J. Surg. Pathol. 42, 1636–1646 (2018).
Kiani, A. et al. Impact of a deep learning assistant on the histopathologic classification of liver cancer. npj Digit. Med. 3, 23 (2020).
Challa, B. et al. Artificial intelligence-aided diagnosis of breast cancer lymph node metastasis on histologic slides in a digital workflow. Mod. Pathol. 36, 100216 (2023).
Bankhead, P., Loughrey, M. B. & Fernández, J. A. QuPath: open source software for digital pathology image analysis. Sci. Rep. 7, 16878 (2017).
Schneider, C. A., Rasband, W. S. & Eliceiri, K. W. NIH Image to ImageJ: 25 years of image analysis. Nat. Methods 9, 671–675 (2012).
Chiu, C. & Clack, N. Napari: a Python multi-dimensional image viewer platform for the research community. Microsc. Microanal. 28(S1), 1576–1577 (2022).
Aubreville, M., Bertram, C., Klopfleisch, R. & Maier, A. SlideRunner—a tool for massive cell annotations in whole slide images. in Bildverarbeitung für die Medizin 2018 (eds Maier, A. et al.) 309–314 (Springer, 2018).
Pocock, J. et al. TIAToolbox as an end-to-end library for advanced tissue image analytics. Commun. Med. 2, 120 (2022).
Schindelin, J. et al. Fiji: an open-source platform for biological-image analysis. Nat. Methods 9, 676–682 (2012).
MONAI model zoo. GitHub https://github.com/Project-MONAI/model-zoo (2022).
Amgad, M. et al. HistomicsTK. GitHub https://digitalslidearchive.github.io/HistomicsTK/ (2016).
Dietvorst, B. J., Simmons, J. P. & Massey, C. Overcoming algorithm aversion: people will use imperfect algorithms if they can (even slightly) modify them. Manage. Sci. 64, 1155–1170 (2018).
Longoni, C., Bonezzi, A. & Morewedge, C. K. Resistance to medical artificial intelligence. J. Consum. Res. 46, 629–650 (2019).
Medela, A. et al. Few shot learning in histopathological images: reducing the need of labeled data on biological datasets. In 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019) 1860–1864 (IEEE, 2019).
van Rijthoven, M. et al. Few-shot weakly supervised detection and retrieval in histopathology whole-slide images. In Medical Imaging 2021: Digital Pathology Vol. 11603, 137–143 (SPIE, 2021).
Chen, J., Jiao, J., He, S., Han, G. & Qin, J. Few-shot breast cancer metastases classification via unsupervised cell ranking. IEEE/ACM Trans. Comput. Biol. Bioinform. 18, 1914–1923 (2021).
Zhu, Z. et al. EasierPath: an open-source tool for human-in-the-loop deep learning of renal pathology. In Interpretable and Annotation-Efficient Learning for Medical Image Computing. IMIMIC 2020, MIL3ID 2020, LABELS 2020 Vol. 12446 (eds Cardoso, J., et al.) 214–222 (Springer, 2020).
Singh, H. & Graber, M. L. Improving diagnosis in health care–the next imperative for patient safety. N. Engl. J. Med. 373, 2493–2495 (2015).
Erickson, L. A., Mete, O., Juhlin, C. C., Perren, A. & Gill, A. J. Overview of the 2022 WHO classification of parathyroid tumors. Endocr. Pathol. 33, 64–89 (2022).
Budd, S., Robinson, E. C. & Kainz, B. A survey on active learning and human-in-the-loop deep learning for medical image analysis. Med. Image Anal. 71, 102062 (2021).
van der Wal, D. et al. Biological data annotation via a human-augmenting AI-based labeling system. npj Digit. Med. 4, 145 (2021).
Settles, B. Active Learning Literature Survey (University of Wisconsin-Madison Department of Computer Sciences, 2009); https://digital.library.wisc.edu/1793/60660
Go, H. Digital pathology and artificial intelligence applications in pathology. Brain Tumor Res. Treat. 10, 76–82 (2022).
Wen, S. et al. Comparison of different classifiers with active learning to support quality control in nucleus segmentation in pathology images. AMIA Jt. Summits Transl. Sci. Proc. 2017, 227–236 (2018).
Hamilton, P. W. et al. Digital pathology and image analysis in tissue biomarker research. Methods 70, 59–73 (2014).
Cheng, J. et al. Integrative analysis of histopathological images and genomic data predicts clear cell renal cell carcinoma prognosis. Cancer Res. 77, e91–e100 (2017).
McQueen, D. B., Perfetto, C. O., Hazard, F. K. & Lathi, R. B. Pregnancy outcomes in women with chronic endometritis and recurrent pregnancy loss. Fertil. Steril. 104, 927–931 (2015).
Ryan, E. et al. The menstrual cycle phase impacts the detection of plasma cells and the diagnosis of chronic endometritis in endometrial biopsy specimens. Fertil. Steril. 118, 787–794 (2022).
Kim, H. J. & Choi, G.-S. Clinical Implications of lymph node metastasis in colorectal cancer: current status and future perspectives. Ann. Coloproctol. 35, 109–117 (2019).
Kiehl, L. et al. Deep learning can predict lymph node status directly from histology in colorectal cancer. Eur. J. Cancer 157, 464–473 (2021).
Khan, A. et al. Computer-assisted diagnosis of lymph node metastases in colorectal cancers using transfer learning with an ensemble model. Mod. Pathol. 36, 100118 (2023).
Mescoli, C. et al. Isolated tumor cells in regional lymph nodes as relapse predictors in stage I and II colorectal cancer. J. Clin. Oncol. 30, 965–971 (2012).
Tizhoosh, H. R. & Pantanowitz, L. Artificial intelligence and digital pathology: challenges and opportunities. J. Pathol. Inform. 9, 38 (2018).
Baxi, V. et al. Association of artificial intelligence-powered and manual quantification of programmed death-ligand 1 (PD-L1) expression with outcomes in patients treated with nivolumab ± ipilimumab. Mod. Pathol. 35, 1529–1539 (2022).
Graham, S. et al. Screening of normal endoscopic large bowel biopsies with interpretable graph learning: a retrospective study. Gut 72, 1709–1721 (2023).
Alemi Koohbanani, N., Jahanifar, M., Zamani Tajadin, N. & Rajpoot, N. NuClick: a deep learning framework for interactive segmentation of microscopic images. Med. Image Anal. 65, 101771 (2020).
Schemmer, M., Kühl, N., Benz, C. & Satzger, G. On the influence of explainable AI on automation bias. Preprint at https://arxiv.org/abs/2204.08859 (2022).
Bond, R. R. et al. Automation bias in medicine: the influence of automated diagnoses on interpreter accuracy and uncertainty when reading electrocardiograms. J. Electrocardiol. 51, S6–S11 (2018).
Parikh, R. B., Teeple, S. & Navathe, A. S. Addressing bias in artificial intelligence in health care. JAMA 322, 2377–2378 (2019).
Alon-Barkat, S. & Busuioc, M. Human–AI interactions in public sector decision making: ‘automation bias’ and ‘selective adherence’ to algorithmic advice. J. Public Adm. Res. Theory 33, 153–169 (2022).
Schmidt, U., Weigert, M., Broaddus, C. & Myers, G. Cell detection with star-convex polygons. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2018. Lecture Notes in Computer Science Vol. 11071 (eds Frangi, A. et al.) 265–273 (Springer, 2018).
Haralick, R. M., Shanmugam, K. & Dinstein, H. I. Textural features for image classification. IEEE Trans. Syst. Man Cybern. SMC-3, 610–621 (1973).
Liu, Z. et al. A ConvNet for the 2020s. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 11966–11976 (IEEE, 2022).
Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (Association for Computing Machinery, 2016).
Mosqueira-Rey, E., Hernández-Pereira, E., Alonso-Ríos, D., Bobes-Bascarán, J. & Fernández-Leal, Á. Human-in-the-loop machine learning: a state of the art. Artif. Intell. Rev. 56, 3005–3054 (2023).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (IEEE, 2016).
Deng, J. et al. ImageNet: a large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).
Li, W., Zhu, X. & Gong, S. Harmonious attention network for person re-identification. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 2285–2294 (IEEE, 2018).
McHugh, M. L. Interrater reliability: the kappa statistic. Biochem. Med. 22, 276–282 (2012).
Zou, K. H., Fielding, J. R., Silverman, S. G. & Tempany, C. M. C. Hypothesis testing I: proportions. Radiology 226, 609–613 (2003).
Acknowledgements
J.Z. is supported by the Chan-Zuckerberg Biohub Investigator Award. We thank M. Yuksekgonul and F. Bianchi for their helpful suggestions in improving our manuscript.
Author information
Authors and Affiliations
Contributions
Z.H. conducted study design, software development, experimental setup, data analysis, data visualization and manuscript writing. E.Y. provided numerous insights and participated in the PC study. J.S. provided numerous insights and participated in the CRC LN study. D.G. provided feedback into both studies and participated in the CRC LN study. F.E. helped collect data for the PC study data and participated in it. B.L. and J.N. participated in both studies. D.B., A.M.D., C.K. and R.R. participated in the most time-consuming CRC LN study. A.G., A.L.C.-G., B.E.H., Y.L., E.E.R., T.B.T. and X.Z. participated in the second most time-consuming PC study. A.F. helped with data collection. E.J.F. and K.S.M. were partially involved in designing the study. T.J.M. and J.Z. oversaw the project, conducted study design, experimental setup, data analysis and manuscript writing.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Biomedical Engineering thanks Jakob Nikolas Kather and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Time comparison for colorectal cancer lymph node identification study.
(a) Overall time comparison between AI-assisted mode and unassisted mode. (b) Time comparison between AI-assisted mode and unassisted mode compared within lymph node positive cases (LN+) and lymph node negative cases (LN-). (c) Time comparison between AI-assisted mode and unassisted mode compared across the 8 pathologists. Note: A few slides were missed/skipped by some pathologists during the experiments and were thus excluded from the final comparison, leading to a reduced sample size (N < 137). (d) Time comparison between AI-assisted mode and unassisted mode stratified by different pathologist groups and lymph node status. P-values were calculated using a two-sided t-test without adjustment. For the boxplots, the interior horizontal line represents the median value, the upper and lower box edges represent the 75th and 25th percentile, and the upper and lower bars represent the 90th and 10th percentiles, respectively.
Extended Data Fig. 2 A lymph node from an experimental slide.
The experimental slide is used for evaluation, with tumor regions highlighted in green rectangles, and tumor cells highlighted in red scatters.
Extended Data Fig. 3 Evaluating individualized model performance to inspection errors (false negatives).
(a) Approach to calculate the ratio of positive nuclei inside the tumor region (green contour) to the lymph node; this ratio is also known as sensitivity (TP/P) (b) Comparison between the ratio of positive nuclei inside tumor region to lymph node when false negatives appear. The tumor regions were manually annotated for all lymph node slides. Abbreviations: lymph node (LN), isolated tumor cells (ITC), micro-metastasis (micromet), macro-metastasis (macromet). P-values were calculated using a two-sided Spearman test without adjustment in Python ‘scipy’ package. For the boxplots, the interior horizontal line represents the median value, the upper and lower box edges represent the 75th and 25th percentile, and the upper and lower bars represent the 90th and 10th percentiles, respectively.
Extended Data Fig. 4 A screenshot of the plasma cell classifier applied to an external slide from colorectal tissue.
In the screenshot, green squares are generated by the program, which are the prediction results for potential plasma cells (N = 27). Upon further manual verification, we highlighted five potential false positives with red circles.
Supplementary information
Supplementary Information
Supplementary figures and tables.
Source data
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huang, Z., Yang, E., Shen, J. et al. A pathologist–AI collaboration framework for enhancing diagnostic accuracies and efficiencies. Nat. Biomed. Eng 9, 455–470 (2025). https://doi.org/10.1038/s41551-024-01223-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41551-024-01223-5