A pathologist–AI collaboration framework for enhancing diagnostic accuracies and efficiencies

Huang, Zhi; Yang, Eric; Shen, Jeanne; Gratzinger, Dita; Eyerer, Frederick; Liang, Brooke; Nirschl, Jeffrey; Bingham, David; Dussaq, Alex M.; Kunder, Christian; Rojansky, Rebecca; Gilbert, Aubre; Chang-Graham, Alexandra L.; Howitt, Brooke E.; Liu, Ying; Ryan, Emily E.; Tenney, Troy B.; Zhang, Xiaoming; Folkins, Ann; Fox, Edward J.; Montine, Kathleen S.; Montine, Thomas J.; Zou, James

doi:10.1038/s41551-024-01223-5

Article
Published: 19 June 2024

A pathologist–AI collaboration framework for enhancing diagnostic accuracies and efficiencies

Nature Biomedical Engineering volume 9, pages 455–470 (2025)Cite this article

7808 Accesses
83 Altmetric
Metrics details

Subjects

Abstract

In pathology, the deployment of artificial intelligence (AI) in clinical settings is constrained by limitations in data collection and in model transparency and interpretability. Here we describe a digital pathology framework, nuclei.io, that incorporates active learning and human-in-the-loop real-time feedback for the rapid creation of diverse datasets and models. We validate the effectiveness of the framework via two crossover user studies that leveraged collaboration between the AI and the pathologist, including the identification of plasma cells in endometrial biopsies and the detection of colorectal cancer metastasis in lymph nodes. In both studies, nuclei.io yielded considerable diagnostic performance improvements. Collaboration between clinicians and AI will aid digital pathology by enhancing accuracies and efficiencies.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overview of the nuclei.io framework and of the study design.**

**Fig. 2: Evaluation of the performance of the pathologist–AI collaboration in the PC nuclei.io expert-trained ML study.**

**Fig. 3: Evaluation of the improvement in pathologist–AI collaboration time in the PC study.**

**Fig. 4: Study design of an individualized ML model for the detection of CRC LN metastasis.**

**Fig. 5: Time improvement and results of the individualized ML model.**

Artificial intelligence in digital pathology — time for a reality check

Article 11 February 2025

Digital pathology and artificial intelligence in translational medicine and clinical practice

Article Open access 05 October 2021

Understanding the errors made by artificial intelligence algorithms in histopathology in terms of patient impact

Article Open access 10 April 2024

Data availability

The data supporting the results in this study are available within the paper and its Supplementary Information. The deidentified nuclei image patches, and pathologists’ annotations are available at https://huangzhii.github.io/nuclei-HAI. Source data are provided with this paper.

Code availability

The source code of nuclei.io is available at https://huangzhii.github.io/nuclei-HAI.

References

Kirillov, A. et al. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) 4015–4026 (IEEE, 2023).
Gamper, J., Alemi Koohbanani, N., Benet, K., Khuram, A. & Rajpoot, N. PanNuke: An open pan-cancer histology dataset for nuclei instance segmentation and classification. In Digital Pathology. ECDP 2019. Lecture Notes in Computer Science Vol. 11435 (eds Reyes-Aldasoro, C. C. et al.) 11–19 (Springer, 2019).
Kather, J. N. et al. Predicting survival from colorectal cancer histology slides using deep learning: a retrospective multicenter study. PLoS Med. 16, e1002730 (2019).
Article PubMed PubMed Central Google Scholar
Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual–language foundation model for pathology image analysis using medical Twitter. Nat. Med. 29, 2307–2316 (2023).
Article CAS PubMed Google Scholar
Lu, M. Y. et al. A visual-language foundation model for computational pathology. Nat. Med. 30, 863–874 (2024).
Article CAS PubMed PubMed Central Google Scholar
Chen, R. J. et al. Towards a general-purpose foundation model for computational pathology. Nat. Med. 30, 850–862 (2024).
Article CAS PubMed PubMed Central Google Scholar
Amgad, M. et al. A population-level digital histologic biomarker for enhanced prognosis of invasive breast cancer. Nat. Med. 30, 85–97 (2024).
Article CAS PubMed Google Scholar
Jiang, X. et al. End-to-end prognostication in colorectal cancer by deep learning: a retrospective, multicentre study. Lancet Digit. Health 6, e33–e43 (2024).
Article CAS PubMed Google Scholar
Liu, Y. et al. Artificial intelligence-based breast cancer nodal metastasis detection: insights into the black box for pathologists. Arch. Pathol. Lab. Med. 143, 859–868 (2019).
Article CAS PubMed Google Scholar
Krogue, J. D. et al. Predicting lymph node metastasis from primary tumor histology and clinicopathologic factors in colorectal cancer using deep learning. Commun. Med. 3, 59 (2023).
Article PubMed PubMed Central Google Scholar
Huang, Z. et al. Artificial intelligence reveals features associated with breast cancer neoadjuvant chemotherapy responses from multi-stain histopathologic images. npj Precis. Oncol. 7, 14 (2023).
Article CAS PubMed PubMed Central Google Scholar
Yamashita, R. et al. Deep learning model for the prediction of microsatellite instability in colorectal cancer: a diagnostic study. Lancet Oncol. 22, 132–141 (2021).
Article PubMed Google Scholar
He, J. et al. The practical implementation of artificial intelligence technologies in medicine. Nat. Med. 25, 30–36 (2019).
Article CAS PubMed PubMed Central Google Scholar
Price, W. N. II, Gerke, S. & Cohen, I. G. Potential liability for physicians using artificial intelligence. JAMA 322, 1765–1766 (2019).
Acs, B., Rantalainen, M. & Hartman, J. Artificial intelligence as the next step towards precision pathology. J. Intern. Med. 288, 62–81 (2020).
Article CAS PubMed Google Scholar
Steiner, D. F. et al. Impact of deep learning assistance on the histopathologic review of lymph nodes for metastatic breast cancer. Am. J. Surg. Pathol. 42, 1636–1646 (2018).
Article PubMed PubMed Central Google Scholar
Kiani, A. et al. Impact of a deep learning assistant on the histopathologic classification of liver cancer. npj Digit. Med. 3, 23 (2020).
Article PubMed PubMed Central Google Scholar
Challa, B. et al. Artificial intelligence-aided diagnosis of breast cancer lymph node metastasis on histologic slides in a digital workflow. Mod. Pathol. 36, 100216 (2023).
Article CAS PubMed Google Scholar
Bankhead, P., Loughrey, M. B. & Fernández, J. A. QuPath: open source software for digital pathology image analysis. Sci. Rep. 7, 16878 (2017).
Article PubMed PubMed Central Google Scholar
Schneider, C. A., Rasband, W. S. & Eliceiri, K. W. NIH Image to ImageJ: 25 years of image analysis. Nat. Methods 9, 671–675 (2012).
Article CAS PubMed PubMed Central Google Scholar
Chiu, C. & Clack, N. Napari: a Python multi-dimensional image viewer platform for the research community. Microsc. Microanal. 28(S1), 1576–1577 (2022).
Article Google Scholar
Aubreville, M., Bertram, C., Klopfleisch, R. & Maier, A. SlideRunner—a tool for massive cell annotations in whole slide images. in Bildverarbeitung für die Medizin 2018 (eds Maier, A. et al.) 309–314 (Springer, 2018).
Pocock, J. et al. TIAToolbox as an end-to-end library for advanced tissue image analytics. Commun. Med. 2, 120 (2022).
Article PubMed PubMed Central Google Scholar
Schindelin, J. et al. Fiji: an open-source platform for biological-image analysis. Nat. Methods 9, 676–682 (2012).
Article CAS PubMed Google Scholar
MONAI model zoo. GitHub https://github.com/Project-MONAI/model-zoo (2022).
Amgad, M. et al. HistomicsTK. GitHub https://digitalslidearchive.github.io/HistomicsTK/ (2016).
Dietvorst, B. J., Simmons, J. P. & Massey, C. Overcoming algorithm aversion: people will use imperfect algorithms if they can (even slightly) modify them. Manage. Sci. 64, 1155–1170 (2018).
Article Google Scholar
Longoni, C., Bonezzi, A. & Morewedge, C. K. Resistance to medical artificial intelligence. J. Consum. Res. 46, 629–650 (2019).
Article Google Scholar
Medela, A. et al. Few shot learning in histopathological images: reducing the need of labeled data on biological datasets. In 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019) 1860–1864 (IEEE, 2019).
van Rijthoven, M. et al. Few-shot weakly supervised detection and retrieval in histopathology whole-slide images. In Medical Imaging 2021: Digital Pathology Vol. 11603, 137–143 (SPIE, 2021).
Chen, J., Jiao, J., He, S., Han, G. & Qin, J. Few-shot breast cancer metastases classification via unsupervised cell ranking. IEEE/ACM Trans. Comput. Biol. Bioinform. 18, 1914–1923 (2021).
Article PubMed Google Scholar
Zhu, Z. et al. EasierPath: an open-source tool for human-in-the-loop deep learning of renal pathology. In Interpretable and Annotation-Efficient Learning for Medical Image Computing. IMIMIC 2020, MIL3ID 2020, LABELS 2020 Vol. 12446 (eds Cardoso, J., et al.) 214–222 (Springer, 2020).
Singh, H. & Graber, M. L. Improving diagnosis in health care–the next imperative for patient safety. N. Engl. J. Med. 373, 2493–2495 (2015).
Article PubMed Google Scholar
Erickson, L. A., Mete, O., Juhlin, C. C., Perren, A. & Gill, A. J. Overview of the 2022 WHO classification of parathyroid tumors. Endocr. Pathol. 33, 64–89 (2022).
Article PubMed Google Scholar
Budd, S., Robinson, E. C. & Kainz, B. A survey on active learning and human-in-the-loop deep learning for medical image analysis. Med. Image Anal. 71, 102062 (2021).
Article PubMed Google Scholar
van der Wal, D. et al. Biological data annotation via a human-augmenting AI-based labeling system. npj Digit. Med. 4, 145 (2021).
Article PubMed PubMed Central Google Scholar
Settles, B. Active Learning Literature Survey (University of Wisconsin-Madison Department of Computer Sciences, 2009); https://digital.library.wisc.edu/1793/60660
Go, H. Digital pathology and artificial intelligence applications in pathology. Brain Tumor Res. Treat. 10, 76–82 (2022).
Article PubMed PubMed Central Google Scholar
Wen, S. et al. Comparison of different classifiers with active learning to support quality control in nucleus segmentation in pathology images. AMIA Jt. Summits Transl. Sci. Proc. 2017, 227–236 (2018).
PubMed Google Scholar
Hamilton, P. W. et al. Digital pathology and image analysis in tissue biomarker research. Methods 70, 59–73 (2014).
Article CAS PubMed Google Scholar
Cheng, J. et al. Integrative analysis of histopathological images and genomic data predicts clear cell renal cell carcinoma prognosis. Cancer Res. 77, e91–e100 (2017).
Article CAS PubMed PubMed Central Google Scholar
McQueen, D. B., Perfetto, C. O., Hazard, F. K. & Lathi, R. B. Pregnancy outcomes in women with chronic endometritis and recurrent pregnancy loss. Fertil. Steril. 104, 927–931 (2015).
Article PubMed Google Scholar
Ryan, E. et al. The menstrual cycle phase impacts the detection of plasma cells and the diagnosis of chronic endometritis in endometrial biopsy specimens. Fertil. Steril. 118, 787–794 (2022).
Article CAS PubMed Google Scholar
Kim, H. J. & Choi, G.-S. Clinical Implications of lymph node metastasis in colorectal cancer: current status and future perspectives. Ann. Coloproctol. 35, 109–117 (2019).
Article PubMed PubMed Central Google Scholar
Kiehl, L. et al. Deep learning can predict lymph node status directly from histology in colorectal cancer. Eur. J. Cancer 157, 464–473 (2021).
Article PubMed Google Scholar
Khan, A. et al. Computer-assisted diagnosis of lymph node metastases in colorectal cancers using transfer learning with an ensemble model. Mod. Pathol. 36, 100118 (2023).
Article CAS PubMed Google Scholar
Mescoli, C. et al. Isolated tumor cells in regional lymph nodes as relapse predictors in stage I and II colorectal cancer. J. Clin. Oncol. 30, 965–971 (2012).
Article PubMed Google Scholar
Tizhoosh, H. R. & Pantanowitz, L. Artificial intelligence and digital pathology: challenges and opportunities. J. Pathol. Inform. 9, 38 (2018).
Article PubMed PubMed Central Google Scholar
Baxi, V. et al. Association of artificial intelligence-powered and manual quantification of programmed death-ligand 1 (PD-L1) expression with outcomes in patients treated with nivolumab ± ipilimumab. Mod. Pathol. 35, 1529–1539 (2022).
Article CAS PubMed PubMed Central Google Scholar
Graham, S. et al. Screening of normal endoscopic large bowel biopsies with interpretable graph learning: a retrospective study. Gut 72, 1709–1721 (2023).
Article PubMed Google Scholar
Alemi Koohbanani, N., Jahanifar, M., Zamani Tajadin, N. & Rajpoot, N. NuClick: a deep learning framework for interactive segmentation of microscopic images. Med. Image Anal. 65, 101771 (2020).
Article PubMed Google Scholar
Schemmer, M., Kühl, N., Benz, C. & Satzger, G. On the influence of explainable AI on automation bias. Preprint at https://arxiv.org/abs/2204.08859 (2022).
Bond, R. R. et al. Automation bias in medicine: the influence of automated diagnoses on interpreter accuracy and uncertainty when reading electrocardiograms. J. Electrocardiol. 51, S6–S11 (2018).
Article PubMed Google Scholar
Parikh, R. B., Teeple, S. & Navathe, A. S. Addressing bias in artificial intelligence in health care. JAMA 322, 2377–2378 (2019).
Article PubMed Google Scholar
Alon-Barkat, S. & Busuioc, M. Human–AI interactions in public sector decision making: ‘automation bias’ and ‘selective adherence’ to algorithmic advice. J. Public Adm. Res. Theory 33, 153–169 (2022).
Article Google Scholar
Schmidt, U., Weigert, M., Broaddus, C. & Myers, G. Cell detection with star-convex polygons. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2018. Lecture Notes in Computer Science Vol. 11071 (eds Frangi, A. et al.) 265–273 (Springer, 2018).
Haralick, R. M., Shanmugam, K. & Dinstein, H. I. Textural features for image classification. IEEE Trans. Syst. Man Cybern. SMC-3, 610–621 (1973).
Article Google Scholar
Liu, Z. et al. A ConvNet for the 2020s. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 11966–11976 (IEEE, 2022).
Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (Association for Computing Machinery, 2016).
Mosqueira-Rey, E., Hernández-Pereira, E., Alonso-Ríos, D., Bobes-Bascarán, J. & Fernández-Leal, Á. Human-in-the-loop machine learning: a state of the art. Artif. Intell. Rev. 56, 3005–3054 (2023).
Article Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (IEEE, 2016).
Deng, J. et al. ImageNet: a large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).
Li, W., Zhu, X. & Gong, S. Harmonious attention network for person re-identification. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 2285–2294 (IEEE, 2018).
McHugh, M. L. Interrater reliability: the kappa statistic. Biochem. Med. 22, 276–282 (2012).
Article Google Scholar
Zou, K. H., Fielding, J. R., Silverman, S. G. & Tempany, C. M. C. Hypothesis testing I: proportions. Radiology 226, 609–613 (2003).
Article PubMed Google Scholar

Download references

Acknowledgements

J.Z. is supported by the Chan-Zuckerberg Biohub Investigator Award. We thank M. Yuksekgonul and F. Bianchi for their helpful suggestions in improving our manuscript.

Author information

Authors and Affiliations

Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
Zhi Huang, Eric Yang, Jeanne Shen, Dita Gratzinger, Frederick Eyerer, Brooke Liang, Jeffrey Nirschl, David Bingham, Alex M. Dussaq, Christian Kunder, Rebecca Rojansky, Aubre Gilbert, Alexandra L. Chang-Graham, Brooke E. Howitt, Ying Liu, Emily E. Ryan, Troy B. Tenney, Xiaoming Zhang, Ann Folkins, Edward J. Fox, Kathleen S. Montine & Thomas J. Montine
Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
Zhi Huang & James Zou

Authors

Zhi Huang
View author publications
You can also search for this author inPubMed Google Scholar
Eric Yang
View author publications
You can also search for this author inPubMed Google Scholar
Jeanne Shen
View author publications
You can also search for this author inPubMed Google Scholar
Dita Gratzinger
View author publications
You can also search for this author inPubMed Google Scholar
Frederick Eyerer
View author publications
You can also search for this author inPubMed Google Scholar
Brooke Liang
View author publications
You can also search for this author inPubMed Google Scholar
Jeffrey Nirschl
View author publications
You can also search for this author inPubMed Google Scholar
David Bingham
View author publications
You can also search for this author inPubMed Google Scholar
Alex M. Dussaq
View author publications
You can also search for this author inPubMed Google Scholar
Christian Kunder
View author publications
You can also search for this author inPubMed Google Scholar
Rebecca Rojansky
View author publications
You can also search for this author inPubMed Google Scholar
Aubre Gilbert
View author publications
You can also search for this author inPubMed Google Scholar
Alexandra L. Chang-Graham
View author publications
You can also search for this author inPubMed Google Scholar
Brooke E. Howitt
View author publications
You can also search for this author inPubMed Google Scholar
Ying Liu
View author publications
You can also search for this author inPubMed Google Scholar
Emily E. Ryan
View author publications
You can also search for this author inPubMed Google Scholar
Troy B. Tenney
View author publications
You can also search for this author inPubMed Google Scholar
Xiaoming Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Ann Folkins
View author publications
You can also search for this author inPubMed Google Scholar
Edward J. Fox
View author publications
You can also search for this author inPubMed Google Scholar
Kathleen S. Montine
View author publications
You can also search for this author inPubMed Google Scholar
Thomas J. Montine
View author publications
You can also search for this author inPubMed Google Scholar
James Zou
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Z.H. conducted study design, software development, experimental setup, data analysis, data visualization and manuscript writing. E.Y. provided numerous insights and participated in the PC study. J.S. provided numerous insights and participated in the CRC LN study. D.G. provided feedback into both studies and participated in the CRC LN study. F.E. helped collect data for the PC study data and participated in it. B.L. and J.N. participated in both studies. D.B., A.M.D., C.K. and R.R. participated in the most time-consuming CRC LN study. A.G., A.L.C.-G., B.E.H., Y.L., E.E.R., T.B.T. and X.Z. participated in the second most time-consuming PC study. A.F. helped with data collection. E.J.F. and K.S.M. were partially involved in designing the study. T.J.M. and J.Z. oversaw the project, conducted study design, experimental setup, data analysis and manuscript writing.

Corresponding authors

Correspondence to Thomas J. Montine or James Zou.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Biomedical Engineering thanks Jakob Nikolas Kather and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Time comparison for colorectal cancer lymph node identification study.

(a) Overall time comparison between AI-assisted mode and unassisted mode. (b) Time comparison between AI-assisted mode and unassisted mode compared within lymph node positive cases (LN+) and lymph node negative cases (LN-). (c) Time comparison between AI-assisted mode and unassisted mode compared across the 8 pathologists. Note: A few slides were missed/skipped by some pathologists during the experiments and were thus excluded from the final comparison, leading to a reduced sample size (N < 137). (d) Time comparison between AI-assisted mode and unassisted mode stratified by different pathologist groups and lymph node status. P-values were calculated using a two-sided t-test without adjustment. For the boxplots, the interior horizontal line represents the median value, the upper and lower box edges represent the 75th and 25th percentile, and the upper and lower bars represent the 90th and 10th percentiles, respectively.

Source data

Extended Data Fig. 2 A lymph node from an experimental slide.

The experimental slide is used for evaluation, with tumor regions highlighted in green rectangles, and tumor cells highlighted in red scatters.

Extended Data Fig. 3 Evaluating individualized model performance to inspection errors (false negatives).

(a) Approach to calculate the ratio of positive nuclei inside the tumor region (green contour) to the lymph node; this ratio is also known as sensitivity (TP/P) (b) Comparison between the ratio of positive nuclei inside tumor region to lymph node when false negatives appear. The tumor regions were manually annotated for all lymph node slides. Abbreviations: lymph node (LN), isolated tumor cells (ITC), micro-metastasis (micromet), macro-metastasis (macromet). P-values were calculated using a two-sided Spearman test without adjustment in Python ‘scipy’ package. For the boxplots, the interior horizontal line represents the median value, the upper and lower box edges represent the 75th and 25th percentile, and the upper and lower bars represent the 90th and 10th percentiles, respectively.

Source data

Extended Data Fig. 4 A screenshot of the plasma cell classifier applied to an external slide from colorectal tissue.

In the screenshot, green squares are generated by the program, which are the prediction results for potential plasma cells (N = 27). Upon further manual verification, we highlighted five potential false positives with red circles.

Supplementary information

Supplementary Information

Supplementary figures and tables.

Reporting Summary

Source data

Source data Figs. 2–5 and Extended Data Figs. 1 and 3

Source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Huang, Z., Yang, E., Shen, J. et al. A pathologist–AI collaboration framework for enhancing diagnostic accuracies and efficiencies. Nat. Biomed. Eng 9, 455–470 (2025). https://doi.org/10.1038/s41551-024-01223-5

Download citation

Received: 09 June 2023
Accepted: 03 May 2024
Published: 19 June 2024
Issue Date: April 2025
DOI: https://doi.org/10.1038/s41551-024-01223-5