Tiffany Tang

Tiffany Tang

I am a postdoctoral researcher with Ji Zhu and Liza Levina in the University of Michigan Statistics Department. My research interests are primarily problem-driven and lie broadly at the intersection of applied statistics/data science and medicine. I will be joining the University of Notre Dame as an Assistant Professor in Fall 2024. Previously, I received my PhD in Statistics from UC Berkeley, where I was advised by Bin Yu.

Interests

  • Statistical Machine Learning
  • Applied Statistics
  • Biomedical Data Science
  • Data Integration
  • Genomics

Education

  • PhD Statistics, 2023

    University of California, Berkeley

  • BS Mathematics, BA Statistics, 2018

    Rice University

Research Overview

With the growing volume and complexity of data in today’s society, I am excited by the opportunity to work closely with scientists and doctors to extract data-driven, reproducible, and actionable insights from the craziness that is data to improve human health.


.js-id-

Cardiovascular Genomics

Cardiovascular disease is the leading cause of death globally and in the US. We expand upon our current understanding of cardiac structure and function through the lens of epistasis, that is, non-additive gene-gene interactions. Through close interdisciplinary collaboration, we combine machine learning and novel experimental techniques to study the effects of these gene-gene interactions on cardiomyocyte cell sizes.

simChef

An R package for tidy, high-quality simulation studies with efficient distributed computation, caching, and automated documenation and visualization of results.

Interpretability

In many high-impact applications, it is crucial to not only achieve high prediction accuracy, but also to identify the most important features involved in the real-world phenomena under study. We develop tools to extract stable feature importances as well as feature interactions under challenging scenarios with low-signal, highly-correlated, and high-dimensional data.

Precision Cancer Medicine

We leverage the rapid advancement of new biomedical technologies and the expansion of ‑omics data (e.g., genomics, epigenomics, proteomics, metabolomics) to solve various problems in precision cancer medicine. This includes collaborative work on the early detection of pancreatic cancer, drug response prediction, and gene regulatory networks for ovarian cancer.

Data Integration/Fusion

Data integration, or the strategic analysis of multiple sources of data simultaneously, can often lead to discoveries that may be hidden in individualistic analyses of a single data source. To facilitate such integrative analyses, we develop practical tools to perform dimension reduction, pattern recognition, and feature selection for integrated data (also called multi-view or multi-modal data).

imodels

A python package for fitting interpretable machine learning models.

COVID-19

To support the community-wide fight against COVID-19, we curated a large open-source corpus of COVID-19-related data from 20+ sources. Using this data, we created an ensemble to forecast the short-term trajectory of COVID-19-related recorded deaths. These forecasts were used by the non-profit organization, Response4Life, to determine the medical supply need and to distribute PPE accordingly.

COVID-19 Data Repository

An open-source data repository with COVID-19-related information from over 20 sources. Includes data on COVID-19 cases and death counts, demographics, socioeconomic characteristics, health risk factors, social mobility, and more.

Scientific Reproducibility

Numerous human judgment calls are inevitably made throughout any data analysis. This includes choices like how to preprocess the data, which models to fit, how to evaluate the performance, and more. If not carefully chosen, these decisions may inadvertently result in spurious downstream conclusions. To mitigate this possibility, we provide tools and stability-driven protocols to facilitate scientific reproducibility and transparent substantive research.

vdocs

An R package for seamless documentation of data analyses via “lab notebooks” to encourage transparent and reliable data science (in early development).

vthemes

A collection of utility functions and modern themes for ggplot2 plots and R Markdown documents.

Publications & Preprints

(2023). Epistasis regulates genetic control of cardiac hypertrophy.

PDF Cite Code

(2023). A blood-based metabolomic signature predictive of risk for pancreatic cancer. Cell Reports Medicine.

PDF Cite

(2023). MDI+: A Flexible Random Forest-Based Feature Importance Framework.

PDF Cite Code Dataset Slides

(2021). The Future will be Different than Today: Model Evaluation Considerations when Developing Translational Clinical Biomarker. KDD Health Day - DSHealth Workshop.

PDF Cite

(2021). Integrated Principal Components Analysis. Journal of Machine Learning Research.

PDF Cite Code Slides

Teaching

UC Berkeley Graduate Student Instructor

Contact