I am a fourth-year PhD student in the UC Berkeley Statistics Department, advised by Bin Yu. My research interests are primarily problem-driven and lie broadly at the intersection of applied statistics/data science and medicine. I am grateful to be supported by the NSF Graduate Research Fellowship. Previously, I studied mathematics and statistics at Rice University, where I was advised by Genevera Allen. I have also spent summers at Genentech and Baylor College of Medicine.
PhD in Statistics, 2018-present
University of California, Berkeley
BS in Mathematics, 2018
BA in Statistics, 2018
With the growing volume and complexity of data in today’s society, I am excited by the opportunity to work closely with scientists and doctors to extract data-driven, reproducible, and actionable insights from the craziness that is data to improve human health.
Block Randomized Adaptive Iterative Lasso (B-RAIL) is a practical tool for selecting important features in high-dimensional multi-view data with mixed data types (e.g., continuous, binary, count-valued). B-RAIL serves as a versatile data integration method for both sparse regression and graph selection problems. In our ovarian cancer case study, B-RAIL successfully identifies well-known biomarkers and hints at novel candidates for future ovarian cancer research.
To support the community-wide fight against COVID-19, we are continuously curating a large open-source corpus of COVID-19-related data from 20+ sources. Using this data, we create an ensemble to forecast the short-term trajectory of COVID-19-related recorded deaths. These forecasts are being used by the non-profit organization, Response4Life, to determine the medical supply need for individual hospitals and have directly contributed to the distribution of medical supplies across the country.
Integrated Principal Components Analysis (iPCA) generalizes the classical PCA to the integrated data setting, where we want to analyze multiple related data sets simultaneously. iPCA can be used for dimension reduction and exploratory data analysis to find and visualize common patterns that occur in multiple data sets. We use iPCA to study the genomic basis of Alzheimer’s Disease (AD) and the genes which contribute to dominant expression patterns in AD.
Random News and Other Activities