Specializing In:
- Probabilistic graphical models
- Predictive modeling
- Clinical research
Biography
I received my medical degree from China Medical University in 2010 with a focus on oncology and PhD in Biostatistics from University at Buffalo in 2018. My long-term research interest is clinical research, probabilistic graphical models, network-based data integration and predictive modeling. Since joining Roswell Park in September 2018, I have participated in multiple Phase I and II clinical trials and served in the Data Safety Monitoring Committee. Using network-based and multi-omics approaches, I have contributed to multiple studies on elucidating the genomic and metabolic changes under environmental exposures or therapeutic interventions. Furthermore, I am experienced in the application of machine learning techniques in biomedical research. By combining biological networks and machine learning techniques, my model for protein abundance prediction in ovarian and breast cancers is one of the top performing models in NCI-CPTAC DREAM proteogenomics computational challenge. I was invited to join a collaborative effort and co-author an overview paper on this competition.
Positions
Roswell Park Comprehensive Cancer Center
- Assistant Professor of Oncology
- Co-Director, Biostatistics and Statistical Genomics Shared Resource
- Department of Biostatistics and Bioinformatics
Background
Education and Training
- 2018 - PhD - Biostatistics, School of Public Health and Health Professions, State University of New York at Buffalo
- 2016 - MA - Biostatistics, School of Public Health and Health Professions, State University of New York at Buffalo
- 2013 – MS – Neuroscience, State University of New York at Buffalo
- 2010 - MBBS & MS - Medicine, China Medical University, China
Professional Memberships
- American Statistical Association
Research Overview
Many biological systems can be well described as networks. Developing statistical methods that can effectively analyze biological networks is currently my research focus. We proposed an attribute-based module detection approach and applied to cell signaling pathways and protein-protein interaction networks with gene expression data in breast cancer patients. This method has been shown effective in identifying functional gene groups responsible for patients’ clinical outcome. Currently, my focus is on integrating network information in predictive modeling. My second research focus is Bayesian network, in which the directed edges encode the conditional independencies between variables. Bayesian networks are graphical representations of complex relationships among variables and can be used to answer probabilistic queries and is broadly applied to facilitate decision-making and prediction effects of perturbations in networks. The R BayesNetBP package I developed is an open source software that supports probabilistic reasoning in conditional Gaussian Bayesian networks.
Publications
- Yu, H., Moharil, J. & Blair, R.H. (2020). BayesNetBP: An R package for probabilistic reasoning in Bayesian Networks. Journal of Statistical Software, 94(3), 1–31.
- Yang, M., Petralia, F., Li, Z., Li, H., and Ma, W., Song, X, Kim, S., Lee, H., Yu, H., Lee, B., Bae, S., Heo, E., Kaczmarczyk, J., Stępniak, P., Warchol, M., Yu, T., Calinawan, AP., Boutros, PC., Payne, SH., and Reva, B., NCI-CPTAC-DREAM Consortium, Boja, E., and Rodriguez, H., Stolovitzky, G., Guan, Y., Kang, J., Wang, P., Fenyo, D., & Saez-Rodriguez, J. (2020). Community assessment of the predictability of cancer protein and phosphoprotein levels from genomics and transcriptomics. Cell Systems. https://doi.org/10.1016/j.cels.2020.06.013.
- Yu, H, & Blair, R.H. (2019). Integration of probabilistic regulatory networks into constraint-based models of metabolism with applications to Alzheimer’s disease. BMC Bioinformatics, 20(1), 386.
- Yu, H, Chapman, B., Di Florio, A., Eishen, E., Gotz, D., Jacob, M. & Blair, R.H. (2019). Bootstrapping estimates of stability for clusters, observations and model selection. Computational Statistics. 34(1), 349-372.
- Yu, H. & Blair, R.H. (2016). A framework for attribute-based community detection with applications to integrated functional genomics. Pacific Symposium in Biocomputing, 21, 69-80.