I am a postdoctoral researcher with Ji Zhu and Liza Levina in the University of Michigan Statistics Department. My research interests are primarily problem-driven and lie broadly at the intersection of applied statistics/data science and medicine. I will be joining the University of Notre Dame as an Assistant Professor in Fall 2024. Previously, I received my PhD in Statistics from UC Berkeley, where I was advised by Bin Yu.
PhD Statistics, 2023
University of California, Berkeley
BS Mathematics, BA Statistics, 2018
Rice University
With the growing volume and complexity of data in today’s society, I am excited by the opportunity to work closely with scientists and doctors to extract data-driven, reproducible, and actionable insights from the craziness that is data to improve human health.
Cardiovascular disease is the leading cause of death globally and in the US. We expand upon our current understanding of cardiac structure and function through the lens of epistasis, that is, non-additive gene-gene interactions. Through close interdisciplinary collaboration, we combine machine learning and novel experimental techniques to study the effects of these gene-gene interactions on cardiomyocyte cell sizes.
An R package for tidy, high-quality simulation studies with efficient distributed computation, caching, and automated documenation and visualization of results.
In many high-impact applications, it is crucial to not only achieve high prediction accuracy, but also to identify the most important features involved in the real-world phenomena under study. We develop tools to extract stable feature importances as well as feature interactions under challenging scenarios with low-signal, highly-correlated, and high-dimensional data.
We leverage the rapid advancement of new biomedical technologies and the expansion of ‑omics data (e.g., genomics, epigenomics, proteomics, metabolomics) to solve various problems in precision cancer medicine. This includes collaborative work on the early detection of pancreatic cancer, drug response prediction, and gene regulatory networks for ovarian cancer.
Data integration, or the strategic analysis of multiple sources of data simultaneously, can often lead to discoveries that may be hidden in individualistic analyses of a single data source. To facilitate such integrative analyses, we develop practical tools to perform dimension reduction, pattern recognition, and feature selection for integrated data (also called multi-view or multi-modal data).
A python package for fitting interpretable machine learning models.
To support the community-wide fight against COVID-19, we curated a large open-source corpus of COVID-19-related data from 20+ sources. Using this data, we created an ensemble to forecast the short-term trajectory of COVID-19-related recorded deaths. These forecasts were used by the non-profit organization, Response4Life, to determine the medical supply need and to distribute PPE accordingly.
An open-source data repository with COVID-19-related information from over 20 sources. Includes data on COVID-19 cases and death counts, demographics, socioeconomic characteristics, health risk factors, social mobility, and more.
Numerous human judgment calls are inevitably made throughout any data analysis. This includes choices like how to preprocess the data, which models to fit, how to evaluate the performance, and more. If not carefully chosen, these decisions may inadvertently result in spurious downstream conclusions. To mitigate this possibility, we provide tools and stability-driven protocols to facilitate scientific reproducibility and transparent substantive research.
An R package for seamless documentation of data analyses via “lab notebooks” to encourage transparent and reliable data science (in early development).
A collection of utility functions and modern themes for ggplot2 plots and R Markdown documents.