Externally Funded Research Projects Undertaken by Longhai Li
Predictive Analysis for High-throughput Data
The accelerated development of high-throughput sequencing biotechnologies has made it affordable to collect high-dimensional molecular-level profiles, such as gene expression, which are called features in general. It is of great interest to identify relevant features associated with a phenotype (eg. cancer status, health disorder). Many researchers have advocated to apply statistical learning methods to perform predictive analysis for high-throughput data. Predictive analysis results can be used in many ways. For example, they can be used to diagnose human diseases, to predict response to a medicine (personalized medicine); they can be used to choose an optimal gene subset for further experiments by plant/animal breeders; the subset of features extracted from good predictive models can facilitate the uncovering of the biological mechanism for a phenotype. Unfortunately, the high-dimensionality causes enormous overfitting in predictive analysis even with very simple models. The chance of finding false predictive features/patterns is extremely high. Therefore, it is challenging to fight against false discovery in predictive analysis. My research in this theme aims to develop new tools for honestly measuring predictivity (such as error rate, AUC) of selected features, and new tools for identifying truly predictive features and for building sharper predictive models for phenotypes. I also practice predictive analysis with specific high-throughput datasets in a variety of scientific problems related to human health.
Predictive Model Evaluation Methods for Spatial-Temporal Data
In science, a theory is tested by performing predictions for observations in the future. Significant discrepancies between observations and predictions suggest that the theory is incorrect or flawed. Similarly, looking at out-of-sample predictions is a straightforward method for comparing and checking goodness-of-fit (GOF) of statistical models. Today, increasingly complex models are being proposed for a variety of correlated data such as, temporal, spatial, and repeated measurements data. More widely applicable predictive methods for comparing and checking such complex models are demanded. My research in this theme aims to develop new tools for evaluating complex Bayesian/non-Bayesian models with correlated random effects, with applications in many areas such as epidemiology, ecology, and environmental sciences.
Predictive Methods for Analyzing High-throughput and Spatial-temporal Data, NSERC Individual Discovery Grant, $100,000, 2019 - 2024, PI.
Genotype & Environment to Phenotype, sub-project from Canada First Research Excellence Fund (CFREF) Project "Designing Crops for Global Food Security", $756,918, 2016-2019, Co-Investigator (PI: Prof. Kusalik).
Applications of Neural Network Curve Fitting Methods for Least-squares Monte Carlo Simulations in Financial Risk Management, MITACS Accelerate Internship Fund, $15,000, 2016, PI.
Bayesian Methods for High-dimensional and Correlated Data, NSERC Individual Discovery Grant, $70,000, 2014 - 2019, PI.
Efficient Bayesian Analysis for Complex Models, NSERC Individual Discovery Grant, $95,000, 2009 - 2014, PI.
A Computer Cluster for Research on Efficient Bayesian Statistical Methods, CFI Leaders Opportunity Fund, $160,000, 2009, PI.
Clustering Analysis for Detecting the Types of Vehicles, MITACS Accelerate Internship Fund, $15,000, 2008, Co-PI with Prof. Laverty.