Richard J. Chen

I am a 5th year Ph.D. Candidate (and NSF-GRFP Fellow) advised by Faisal Mahmood at Harvard University, and also within Brigham and Women’s Hospital, Dana-Farber Cancer Institute, and the Broad Institute.

Prior to starting my Ph.D., I obtained my B.S/M.S. in Biomedical Engineering and Computer Science at Johns Hopkins University, where I worked with Nicholas Durr and Alan Yuille. In industry, I have also worked at Apple Inc. in the Health Special Project and Applied Machine Learning Groups (with Belle Tseng and Andrew Trister), and at Microsoft Research in the BioML Group (with Rahul Gopalkrishnan).

Research Highlights

Multimodal Integration: Multimodal learning has emerged as an interdisciplinary field to solve many core problems in machine perception, human-computer interaction, and recently in biology & medicine, in which there is often an enormous wealth of multimodal data collected in parallel to study the same underlying disease. I have worked on a range of problems in multimodal learning for integrating: 1) multimodal sensor streams from the Apple Watch and iPhone Data to predict mild cognitive decline, 2) RGB and depth images for non-polyploidal lesion classification and SLAM in surgical robotics, and 3) pathology images and genomics for cancer prognosis.
Representation Learning for Gigapixel Images: Though deep learning has revolutionized computer vision in many disciplines, gigapixel whole-slide imaging (WSI) in computational pathology is a complex computer vision domain that renders traditional, Convnet-based supervised learning approaches infeasible. To address this issue, I have been working on interpreting large gigapixel images as permutation-invariant sets (or bags in MIL literature), and then developing Transformer-based approaches for weakly-supervised learning and self-supervised learning in WSIs.
Generative AI & Healthcare Policy: “What constitutes authenticity, and how would the lack of authenticity shape our perception of reality?” The science fiction American writer Philip K. Dick posited similar questions throughout his literary career and, in particular, in his 1972 essay “How to build a universe that doesn’t fall apart two days later”. I am interested in: using synthetic data for domain adaptation / generalization, developing synthetic environments for simulating challenging scenarios for neural networks, as well as the the policy challenges in training AI-SaMDs with synthetic data,

Recent News

Aug, 2023	Excited to share our latest preprint on UNI, a general-purpose self-supervised model for computational pathology. In addition, my Master’s student, Tong Ding, is joining the Computer Science Ph.D. program at Harvard University (SEAS). Congratulations Tong!
Jul, 2023	Our perspective on algorithm fairness in AI and medicine/healthcare was published in Nature BME. In addition, excited to share our latest preprint on CONCH (CONtrastive learning from Captions for Histopathology), a visual-language foundation model for computational pathology. Stay tuned!
Jun, 2023	Our work on zero-shot slide classification with visual-language pretraining was published in CVPR. Code + pretrained model weights are made available.
Aug, 2022	Our work on PORPOISE (Pathology-Omic Research Platform for Integrated Survival Estimation), and our review on multimodal learning for oncology were both published in Cancer Cell. See the associated demo!
Jun, 2022	Our work on Hierarchical Image Pyramid Transformer (HIPT) is highlighted as an Oral Presentation in CVPR, and as a Spotlight Talk in the Transformers 4 Vision (T4V) CVPR Workshop. Code + pretrained model weights are made available. Lastly, my visiting student, Yicong Li, is joining the Computer Science Ph.D. program at Harvard University (SEAS). Congratulations Yicong!
Mar, 2022	Our work on CRANE was published in Nature Medicine. Also, code + pretrained model weights are made available for our recent Self-Supervised ViT work in NeurIPSW LMRL 2021. Lastly, our work on federated learning for CPATH (HistoFL) was published in Medical Image Analysis.
Jul, 2021	Joined Microsoft Research as an PhD Research Intern, working with Rahul Gopalkrishnan in the BioML Group. In press, our commentary on synthetic data for machine learning and healthcare was also published in Nature BME. Lastly, two papers, Patch-GCN and Multimodal Co-Attention Transformers (MCAT), were accepted into MICCAI and ICCV respectively.

Select Publications

A General-Purpose Self-Supervised Model for Computational Pathology

Richard J. Chen, Tong Ding, Ming Y. Lu, Drew F. K. Williamson, Guillaume Jaume, Bowen Chen, Andrew Zhang, Daniel Shao, Andrew H. Song, Muhammad Shaban, Mane Williams, Anurag Vaidya, Sharifa Sahai, Lukas Oldenburg, Luca L. Weishaupt, Judy J. Wang, Walt Williams, Long Phi Le, Georg Gerber, and Faisal Mahmood

arXiv preprint arXiv:TBD 2023

Abs arXiv Cite

Tissue phenotyping is a fundamental computational pathology (CPath) task in learning objective characterizations of histopathologic biomarkers in anatomic pathology. However, whole-slide imaging (WSI) poses a complex computer vision problem in which the large-scale image resolutions of WSIs and the enormous diversity of morphological phenotypes preclude large-scale data annotation. Current efforts have proposed using pretrained image encoders with either transfer learning from natural image datasets or self-supervised pretraining on publicly-available histopathology datasets, but have not been extensively developed and evaluated across diverse tissue types at scale. We introduce UNI, a general-purpose, self-supervised model for pathology, pretrained using over 100 million tissue patches from over 100,000 diagnostic haematoxylin and eosin-stained WSIs across 20 major tissue types, and evaluated on 33 representative CPath clinical tasks in CPath of varying diagnostic difficulties. In addition to outperforming previous state-of-the-art models, we demonstrate new modeling capabilities in CPath such as resolution-agnostic tissue classification, slide classification using few-shot class prototypes, and disease subtyping generalization in classifying up to 108 cancer types in the OncoTree code classification system. UNI advances unsupervised representation learning at scale in CPath in terms of both pretraining data and downstream evaluation, enabling data-efficient AI models that can generalize and transfer to a gamut of diagnostically-challenging tasks and clinical workflows in anatomic pathology.
```
@article{chen2023general,
  title = {A General-Purpose Self-Supervised Model for Computational Pathology},
  author = {Chen, Richard J. and Ding, Tong and Lu, Ming Y. and Williamson, Drew F. K. and Jaume, Guillaume and Chen, Bowen and Zhang, Andrew and Shao, Daniel and Song, Andrew H. and Shaban, Muhammad and Williams, Mane and Vaidya, Anurag and Sahai, Sharifa and Oldenburg, Lukas and Weishaupt, Luca L. and Wang, Judy J. and Williams, Walt and Le, Long Phi and Gerber, Georg and Mahmood, Faisal},
  journal = {arXiv preprint arXiv:TBD},
  year = {2023},
  arxiv = {TBD},
  abbr = {chen2023general.jpg},
  selected = {true}
}
```
Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning

Richard J Chen, Chengkuan Chen, Yicong Li, Tiffany Y Chen, Andrew D Trister, Rahul G Krishnan, and Faisal Mahmood

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022

Oral Presentation

Abs arXiv HTML Code Cite

Vision Transformers (ViTs) and their multi-scale and hierarchical variations have been successful at capturing image representations but their use has been generally studied for low-resolution images (e.g. - 256x256, 384x384). For gigapixel whole-slide imaging (WSI) in computational pathology, WSIs can be as large as 150000x150000 pixels at 20x magnification and exhibit a hierarchical structure of visual tokens across varying resolutions: from 16x16 images capture spatial patterns among cells, to 4096x4096 images characterizing interactions within the tissue microenvironment. We introduce a new ViT architecture called the Hierarchical Image Pyramid Transformer (HIPT), which leverages the natural hierarchical structure inherent in WSIs using two levels of self-supervised learning to learn high-resolution image representations. HIPT is pretrained across 33 cancer types using 10,678 gigapixel WSIs, 408,218 4096x4096 images, and 104M 256x256 images. We benchmark HIPT representations on 9 slide-level tasks, and demonstrate that: 1) HIPT with hierarchical pretraining outperforms current state-of-the-art methods for cancer subtyping and survival prediction, 2) self-supervised ViTs are able to model important inductive biases about the hierarchical structure of phenotypes in the tumor microenvironment.
```
@inproceedings{chen2022scaling,
  title = {Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning},
  author = {Chen, Richard J and Chen, Chengkuan and Li, Yicong and Chen, Tiffany Y and Trister, Andrew D and Krishnan, Rahul G and Mahmood, Faisal},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year = {2022},
  pages = {16144--16155},
  code = {https://github.com/mahmoodlab/HIPT},
  url = {https://openaccess.thecvf.com/content/CVPR2022/html/Chen_Scaling_Vision_Transformers_to_Gigapixel_Images_via_Hierarchical_Self-Supervised_Learning_CVPR_2022_paper.html},
  arxiv = {2206.02647},
  abbr = {chen2022scaling.png},
  selected = {true},
  honor = {Oral Presentation}
}
```
Developing Measures of Cognitive Impairment in the Real World from Consumer-Grade Multimodal Sensor Streams

Richard J. Chen*, Filip Jankovic*, Nikki Marinsek*, Luca Foschini, Lampros Kourtis, Alessio Signorini, Melissa Pugh, Jie Shen, Roy Yaari, Vera Maljkovic, Marc Sunga, Han Hee Song, Hyun Joon Jung, Belle Tseng, and Andrew Trister

In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 2019

Oral Presentation & Best Paper Runner-Up

Abs HTML Oral News Cite

The ubiquity and remarkable technological progress of wearable consumer devices and mobile-computing platforms (smart phone, smart watch, tablet), along with the multitude of sensor modalities available, have enabled continuous monitoring of patients and their daily activities. Such rich, longitudinal information can be mined for physiological and behavioral signatures of cognitive impairment and provide new avenues for detecting MCI in a timely and cost-effective manner. In this work, we present a platform for remote and unobtrusive monitoring of symptoms related to cognitive impairment using several consumer-grade smart devices. We demonstrate how the platform has been used to collect a total of 16TB of data during the Lilly Exploratory Digital Assessment Study, a 12-week feasibility study which monitored 31 people with cognitive impairment and 82 without cognitive impairment in free living conditions. We describe how careful data unification, time-alignment, and imputation techniques can handle missing data rates inherent in real-world settings and ultimately show utility of these disparate data in differentiating symptomatics from healthy controls based on features computed purely from device data.
```
@inproceedings{chen2019developing,
  doi = {10.1145/3292500.3330690},
  url = {https://machinelearning.apple.com/research/developing-measures-of-cognitive-impairment-in-the-real-world-from-consumer-grade-multimodal-sensor-streams},
  year = {2019},
  month = jul,
  publisher = {{ACM}},
  author = {Chen*, Richard J. and Jankovic*, Filip and Marinsek*, Nikki and Foschini, Luca and Kourtis, Lampros and Signorini, Alessio and Pugh, Melissa and Shen, Jie and Yaari, Roy and Maljkovic, Vera and Sunga, Marc and Song, Han Hee and Jung, Hyun Joon and Tseng, Belle and Trister, Andrew},
  title = {Developing Measures of Cognitive Impairment in the Real World from Consumer-Grade Multimodal Sensor Streams},
  booktitle = {Proceedings of the 25th {ACM} {SIGKDD} International Conference on Knowledge Discovery {\&} Data Mining},
  honor = {Oral Presentation & Best Paper Runner-Up},
  press = {https://www.technologyreview.com/2019/08/08/102821/your-apple-watch-might-one-day-spot-if-youre-developing-alzheimers/},
  abbr = {chen2019developing.png},
  oral = {https://www.youtube.com/watch?v=H_wTI4LUW7A},
  selected = {true}
}
```
Pan-Cancer Integrative Histology-Genomic Analysis via Multimodal Deep Learning

Richard J Chen, Ming Y Lu, Drew FK Williamson, Tiffany Y Chen, Jana Lipkova, Muhammad Shaban, Maha Shady, Mane Williams, Bumjin Joo, Zahra Noor, and Faisal Mahmood

Cancer Cell 2022

Best Paper, Case Western Artificial Intelligence in Oncology Symposium, 2020. Cover Art of Cancer Cell (Volume 40 Issue 8).

Abs arXiv HTML Code Demo Cite

Summary The rapidly emerging field of computational pathology has demonstrated promise in developing objective prognostic models from histology images. However, most prognostic models are either based on histology or genomics alone and do not address how these data sources can be integrated to develop joint image-omic prognostic models. Additionally, identifying explainable morphological and molecular descriptors from these models that govern such prognosis is of interest. We use multimodal deep learning to jointly examine pathology whole-slide images and molecular profile data from 14 cancer types. Our weakly supervised, multimodal deep-learning algorithm is able to fuse these heterogeneous modalities to predict outcomes and discover prognostic features that correlate with poor and favorable outcomes. We present all analyses for morphological and molecular correlates of patient prognosis across the 14 cancer types at both a disease and a patient level in an interactive open-access database to allow for further exploration, biomarker discovery, and feature assessment.
```
@article{chen2021pan,
  title = {Pan-Cancer Integrative Histology-Genomic Analysis via Multimodal Deep Learning},
  journal = {Cancer Cell},
  volume = {40},
  number = {8},
  pages = {865-878.e6},
  year = {2022},
  issn = {1535-6108},
  doi = {https://doi.org/10.1016/j.ccell.2022.07.004},
  url = {https://www.sciencedirect.com/science/article/pii/S1535610822003178},
  author = {Chen, Richard J and Lu, Ming Y and Williamson, Drew FK and Chen, Tiffany Y and Lipkova, Jana and Shaban, Muhammad and Shady, Maha and Williams, Mane and Joo, Bumjin and Noor, Zahra and Mahmood, Faisal},
  keywords = {deep learning, artificial intelligence, multimodal integration, cancer prognosis, multimodal prognostic models, pan-cancer, biomarker discovery, data fusion, computational pathology, digital pathology},
  arxiv = {2108.02278},
  code = {https://github.com/mahmoodlab/PORPOISE},
  demo = {http://pancancer.mahmoodlab.org},
  selected = {true},
  abbr = {chen2022pan.png},
  honor = {Best Paper, Case Western Artificial Intelligence in Oncology Symposium, 2020. Cover Art of Cancer Cell (Volume 40 Issue 8).}
}
```
Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images

Richard J. Chen, Ming Y. Lu, Wei H. Weng, Tiffany Y Chen, Drew FK Williamson, Trevor Manz, Maha Shady, and Faisal Mahmood

In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) 2021

Abs HTML Code Cite

Survival outcome prediction is a challenging weakly-supervised and ordinal regression task in computational pathology that involves modeling complex interactions within the tumor microenvironment in gigapixel whole slide images (WSIs). Despite recent progress in formulating WSIs as bags for multiple instance learning (MIL), representation learning of entire WSIs remains an open and challenging problem, especially in overcoming: 1) the computational complexity of feature aggregation in large bags, and 2) the data heterogeneity gap in incorporating biological priors such as genomic measurements. In this work, we present a Multimodal Co-Attention Transformer (MCAT) framework that learns an interpretable, dense co-attention mapping between WSIs and genomic features formulated in an embedding space. Inspired by approaches in Visual Question Answering (VQA) that can attribute how word embeddings attend to salient objects in an image when answering a question, MCAT learns how histology patches attend to genes when predicting patient survival. In addition to visualizing multimodal interactions, our co-attention transformation also reduces the space complexity of WSI bags, which enables the adaptation of Transformer layers as a general encoder backbone in MIL. We apply our proposed method on five different cancer datasets (4,730 WSIs, 67 million patches). Our experimental results demonstrate that the proposed method consistently achieves superior performance compared to the state-of-the-art methods.
```
@inproceedings{chen2021multimodal,
  title = {Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images},
  url = {https://openaccess.thecvf.com/content/ICCV2021/html/Chen_Multimodal_Co-Attention_Transformer_for_Survival_Prediction_in_Gigapixel_Whole_Slide_ICCV_2021_paper.html},
  author = {Chen, Richard J. and Lu, Ming Y. and Weng, Wei H. and and Tiffany Y Chen and Williamson, Drew FK and Manz, Trevor and Shady, Maha and Mahmood, Faisal},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  pages = {4015--4025},
  year = {2021},
  code = {https://github.com/mahmoodlab/MCAT},
  abbr = {chen2021multimodal.png},
  selected = {true}
}
```
Algorithmic fairness in artificial intelligence for medicine and healthcare

Richard J. Chen, Judy J. Wang, Drew FK. Williamson, Tiffany Y. Chen, Jana Lipkova, Ming Y. Lu, Sharifa Sahai, and Faisal Mahmood

Nature Biomedical Engineering 2023

Abs HTML Cite

In healthcare, the development and deployment of insufficiently fair systems of artificial intelligence (AI) can undermine the delivery of equitable care. Assessments of AI models stratified across subpopulations have revealed inequalities in how patients are diagnosed, treated and billed. In this Perspective, we outline fairness in machine learning through the lens of healthcare, and discuss how algorithmic biases (in data acquisition, genetic variation and intra-observer labelling variability, in particular) arise in clinical workflows and the resulting healthcare disparities. We also review emerging technology for mitigating biases via disentanglement, federated learning and model explainability, and their role in the development of AI-based software as a medical device.
```
@article{chen2023algorithmic,
  title = {Algorithmic fairness in artificial intelligence for medicine and healthcare},
  author = {Chen, Richard J. and Wang, Judy J. and Williamson, Drew FK. and Chen, Tiffany Y. and Lipkova, Jana and Lu, Ming Y. and Sahai, Sharifa and Mahmood, Faisal},
  journal = {Nature Biomedical Engineering},
  volume = {7},
  number = {6},
  pages = {719--742},
  year = {2023},
  publisher = {Nature Publishing Group UK London},
  selected = {true},
  abbr = {chen2021algorithm.png},
  url = {https://www.nature.com/articles/s41551-023-01056-8}
}
```
Synthetic Data in Machine Learning for Medicine and Healthcare

Richard J. Chen, Ming Y. Lu, Tiffany Y. Chen, Drew F. K. Williamson, and Faisal Mahmood

Nature Biomedical Engineering 2021

Abs HTML Cite

The proliferation of synthetic data in artificial intelligence for medicine and healthcare raises concerns about the vulnerabilities of the software and the challenges of current policy.
```
@article{chen2021synthetic,
  doi = {10.1038/s41551-021-00751-8},
  url = {https://doi.org/10.1038/s41551-021-00751-8},
  year = {2021},
  month = jun,
  publisher = {Springer Science and Business Media {LLC}},
  volume = {5},
  number = {6},
  pages = {493--497},
  author = {Chen, Richard J. and Lu, Ming Y. and Chen, Tiffany Y. and Williamson, Drew F. K. and Mahmood, Faisal},
  title = {Synthetic Data in Machine Learning for Medicine and Healthcare},
  journal = {Nature Biomedical Engineering},
  abbr = {chen2021synthetic.png},
  selected = {true}
}
```
Federated Learning for Computational Pathology on Gigapixel Whole Slide Images

Ming Y. Lu*, Richard J. Chen*, Dehan Kong, Jana Lipkova, Rajendra Singh, Drew FK. Williamson, Tiffany Y. Chen, and Faisal Mahmood

Medical Image Analysis 2022

Abs arXiv HTML Code Cite

Deep Learning-based computational pathology algorithms have demonstrated profound ability to excel in a wide array of tasks that range from characterization of well known morphological phenotypes to predicting non-human-identifiable features from histology such as molecular alterations. However, the development of robust, adaptable, and accurate deep learning-based models often rely on the collection and time-costly curation large high-quality annotated training data that should ideally come from diverse sources and patient populations to cater for the heterogeneity that exists in such datasets. Multi-centric and collaborative integration of medical data across multiple institutions can naturally help overcome this challenge and boost the model performance but is limited by privacy concerns amongst other difficulties that may arise in the complex data sharing process as models scale towards using hundreds of thousands of gigapixel whole slide images. In this paper, we introduce privacy-preserving federated learning for gigapixel whole slide images in computational pathology using weakly-supervised attention multiple instance learning and differential privacy. We evaluated our approach on two different diagnostic problems using thousands of histology whole slide images with only slide-level labels. Additionally, we present a weakly-supervised learning framework for survival prediction and patient stratification from whole slide images and demonstrate its effectiveness in a federated setting. Our results show that using federated learning, we can effectively develop accurate weakly supervised deep learning models from distributed data silos without direct data sharing and its associated complexities, while also preserving differential privacy using randomized noise generation.
```
@article{lu2022federated,
  title = {Federated Learning for Computational Pathology on Gigapixel Whole Slide Images},
  url = {https://www.sciencedirect.com/science/article/pii/S1361841521003431},
  author = {Lu*, Ming Y. and Chen*, Richard J. and Kong, Dehan and Lipkova, Jana and Singh, Rajendra and Williamson, Drew FK. and Chen, Tiffany Y. and Mahmood, Faisal},
  journal = {Medical Image Analysis},
  volume = {76},
  pages = {102298},
  year = {2022},
  publisher = {Elsevier},
  code = {https://github.com/mahmoodlab/HistoFL},
  abbr = {lu2022histofl.png},
  selected = {true},
  arxiv = {2009.10190}
}
```
Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis

Richard J. Chen, Ming Y. Lu, Jingwen Wang, Drew F. K. Williamson, Scott J. Rodig, Neal I. Lindeman, and Faisal Mahmood

IEEE Transactions on Medical Imaging 2020

Top 5 Posters, NVIDIA GTC 2020

Abs arXiv Oral News Code Cite

Cancer diagnosis, prognosis, and therapeutic response predictions are based on morphological information from histology slides and molecular profiles from genomic data. However, most deep learning-based objective outcome prediction and grading paradigms are based on histology or genomics alone and do not make use of the complementary information in an intuitive manner. In this work, we propose Pathomic Fusion, an interpretable strategy for end-to-end multimodal fusion of histology image and genomic (mutations, CNV, RNASeq) features for survival outcome prediction. Our approach models pairwise feature interactions across modalities by taking the Kronecker product of unimodal feature representations, and controls the expressiveness of each representation via a gatingbased attention mechanism. Following supervised learning, we are able to interpret and saliently localize features across each modality, and understand how feature importance shifts when conditioning on multimodal input. We validate our approach using glioma and clear cell renal cell carcinoma datasets from the Cancer Genome Atlas (TCGA), which contains paired wholeslide image, genotype, and transcriptome data with ground truth survival and histologic grade labels. In a 15-fold cross-validation, our results demonstrate that the proposed multimodal fusion paradigm improves prognostic determinations from ground truth grading and molecular subtyping, as well as unimodal deep networks trained on histology and genomic data alone. The proposed method establishes insight and theory on how to train deep networks on multimodal biomedical data in an intuitive manner, which will be useful for other problems in medicine that seek to combine heterogeneous data streams for understanding diseases and predicting response and resistance to treatment. Code and trained models are made available at: https://github.com/mahmoodlab/PathomicFusion.
```
@article{chen2019pathomic,
  doi = {10.1109/tmi.2020.3021387},
  year = {2020},
  publisher = {Institute of Electrical and Electronics Engineers ({IEEE})},
  pages = {1--1},
  author = {Chen, Richard J. and Lu, Ming Y. and Wang, Jingwen and Williamson, Drew F. K. and Rodig, Scott J. and Lindeman, Neal I. and Mahmood, Faisal},
  title = {Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis},
  journal = {{IEEE} Transactions on Medical Imaging},
  arxiv = {1912.08937},
  abbr = {chen2019pathomic.png},
  honor = {Top 5 Posters, NVIDIA GTC 2020},
  oral = {https://www.youtube.com/watch?v=TrjGEUVX5YE},
  code = {https://github.com/mahmoodlab/PathomicFusion},
  press = {https://blogs.nvidia.com/blog/2019/11/07/harvard-pathology-lab-data-fusion-ai-cancer/},
  selected = {true}
}
```