Statistics Seminar Series

Department of Mathematics and Statistics and
Collaborative Graduate Program in Biostatistics

University of Saskatchewan

The Statistics Seminar Series is a forum for researchers with interest in statistics to share their ideas or problems and forge collaborative relationships.


Our meeting of term 2 is scheduled on Thursdays, 2-3pm. If you wish to invite a speaker or speak yourself for this seminar, please contact the current organizer: Longhai Li ( Please check our tentative schedule of future talks for available time slots. The current organizer particularly invites researchers on U of S campus that are seeking interdisciplinary work with statisticians to introduce their research problems.

Upcoming and Past Talks

April 5, 2-3, Educ 2014 Mohammed Obeidat, Ph.D. Candidate, Department of Mathematics and Statistics, University of Saskatchewan

Analysis of Time Series Models for Count Data

My research is aimed to analyze time series models for count data via both frequentest and Bayesian approaches. Parameter-driven Poisson model will be fitted to the count data. In Parameter-driven model the distribution of the observed data depends on a latent (unobserved) process in such a way that the observed data are assumed to be independent given this latent process, while they are correlated marginally. The estimation of such models is not an easy problem as the likelihood function of the observed data involves high-dimensional integrals over the distribution of the latent process. I will discuss the estimation of such models using the full likelihood function and a pseudo-likelihood function. The motivation of using a pseudo-likelihood is to reduce computational burden by avoiding the evaluation of high-dimensional integrals. Moreover, it has been shown that the maximum pseudo-likelihood estimator is asymptotically unbiased and normally distributed.
Mar 22, 2-3, Murry 299
Matthew Schmirler, M.Sc. Candidate, Department of Mathematics and Statistics, University of Saskatchewan

Multiple Markov Chain Simulations of an Interacting Self-Avoiding Polygon Model

My research is motivated by the type II topoisomerase enzyme. This enzyme helps to unknot DNA molecules during cellular replication via a 'strand passage' process, in which one segment of DNA is passed through another. This process is essential to replication, as a DNA molecule that is knotted cannot successfully replicate. It is not fully understood how this enzyme chooses a location to act on the DNA; better understanding this mechanism could possibly lead to more effective disease treatments (by inhibiting the topoisomerase process, which would lead to cell death). The talk will focus on modelling DNA molecules and the topoisomerase strand passage process using self-avoiding polygons in the cubic lattice,  using multiple markov chain (MMC) algorithms to generate random self-avoiding polygons, as well as the introduction of an interactive energy term which represents the effect of 'adding salt' into the model. Brief introductions to knot theory and DNA topology will also be included.

Feb 16, 2-3, Educ 2014
Dr. Bassirou Chitou,  US Centers for Disease Control and Prevention, Rwanda

Methods for a Behavioral Surveillance Survey of Female Sex Workers in Rwanda

Abstract: see the PDF file.
Feb 02, 2-3, Murry 299 Prof. Mikelis Bickis, Department of Mathematics and Statistics, University of Saskatchewan

The geometry of imprecise inference

A statistical model can be constructed from a null probability distribution on the observation space by defining a set of functions representing the log-likelihood ratios of alternative distributions to the null distribution. Conversely, any model all of whose members have the same null sets can be expressed in this way. The set of functions parametrizes the model, which can be extended to a convex subset in their linear span. An exhaustive model using only a finite number of basis functions constitutes an exponential family model. Given an arbitrary "prior" distribution on this parameter space, the space of function generates a family of possible posterior distributions parametrized by elements of the observation space. Bayesian updating of a prior distribution can then be visualized as a translation by an update function.

Inference by Bayesian updating can be justified by axioms of rationality such as those proposed by de Finetti or Savage. Weakening these axioms leads to the imprecise inference of Walley in which there is not a single distinguished prior distribution, but a set of priors. Updating is now achieved by translation of the entire set, leading to upper and lower limits on posterior expectations. A crucial but seemingly arbitrary choice of this inferential paradigm is the definition of a suitable set of priors giving maximum imprecision a priori yet leading to informative inferences after observing data. The shape of the set of priors can affect various additional desiderata for rules of inference.

Jan 19, 2-3, Educ 1024 Prof. Jill Johnstone, Department of Biology, University of Saskatchewan

Process uncertainty, measurement error, complex systems, and messy data: Perspectives from the desktop of a field ecologist

Ecological research often generates datasets with a suite of characteristics that make statistical analysis a formidable and daunting task for ecologists. Common attributes of ecological datasets include hierarchical (nested) data structures, spatial or temporal autocorrelation, multicollinearity of predictor data, unknown measurement error, and complex interactions between dependent and independent variables. This seminar will provide some examples of datasets from field investigations of plant ecology that have great potential for ecological insight, but have often left me struggling with a pandora's box of statistical challenges. The intent is to stimulate dialogue to improve understanding between ecologists and statisticians about more effective ways to address the challenges of collecting and analyzing messy ecological data.
Jan 05, 2-3, Arts 241 Prof. Lisa Lix, School of Public Health, University of Saskatchewan

Comparing Variable Importance Measures for Two Independent Groups

Descriptive discriminative analysis (DDA), logistic regression analysis (LRA), and stepwise multivariate analysis of variance (MANOVA) procedures can be used to produce measures of the relative importance of a set of correlated variables for distinguishing between two independent groups. This research compares six measures of relative importance based on DDA, LRA, and MANOVA models for rank ordering a set of correlated variables using Monte Carlo techniques.

Powerpoint slides are available for this talk.

Jan 03, 2-3, Arts 105 Dr. Yunqi Ji, Postdoc Fellow, Faculty of Medicine, McGill University

Analysis of Imperfect Longitudinal Data Subject to Misclassification and Informative ``Unsure" Responses

In epidemiological studies, respondents are often required to answer some questions from pretested questionnaires using a "Yes", "No" or "Unsure" as the response. An "Unsure" answer leads to loss of information about the respondent's inherent status. In addition, even a "Yes" or "No" response may misclassify the respondent's true status. An unbalanced misclassification model is presented to describe the misclassification and "Unsure" Responses. We examined the impact of misclassification and "Unsure" responses on model estimation. An estimating approach is proposed to correct the attenuation and improve e ciency of statistical inference taking into account both misclassification and "Unsure" responses.
Nov 24, 1:15-2:15, Arts 105 Yaqing Liu, M.Sc. Candidate, University of Saskatchewan

Bias Analysis for Logistic Regression with a Misclassified Multi-categorical Exposure

In epidemiological studies, one common issue is that, for various reasons, possible errors may contaminate the exposure variable. The term ``measurement error" refers to a continuous exposure variable, and the term ``misclassification" refers to a categorical or discrete exposure. The mismeasurement has an effect on detecting the actual relationship between the exposure and the health outcom. In fact, biased estimates with falsely small standard errors may be obtained if investigators naively ignore the mismeasurement. The aim of my talk is to assess the asymptotic bias when the misclassification in a multi-categorical exposure is ignored. The theoretical result of my work extends the work by Davidov et al.(2003) from a binary exposure to a multi-categorical exposure under the context of logistic regression models. The result of this study is useful to guide for the large scale prospective cohort and case-control studies.
Nov 10, 1:15-2:15, Arts 213 Tolulope Sajobi, PhD Candidate, University of Saskatchewan

Robust Descriptive Discriminant Analysis for Repeated Measures Data

Discriminant analysis procedures based on parsimonious mean and/or covariance structures have recently been proposed for repeated measures data. However, these procedures rest on the assumption of a multivariate normal distribution. This study examines repeated measures DA (RMDA) procedures based on maximum likelihood (ML) and coordinatewise trimming (CT) estimation methods and investigates bias and root mean square error (RMSE) in discriminant function coefficients (DFCs) of these procedures under non-normal distributions. Our study results suggest that the average bias of CT estimates of DFCs for RMDA procedures that assume unstructured group means were at least 40% smaller than the values for corresponding procedures based on ML estimators. However, the average RMSE for the former were about 10% smaller than the values for the latter procedures when the data were sampled from extremely skewed or heavy-tailed distributions. The proposed robust procedures can be used to identify the measurement occasions that make the largest contribution to group separation when the data are sampled from multivariatej skewed or heavy-tailed distributions.
Oct 27, 1:15-2:15, Arts 105 Lai Jiang, PhD Candidate, University of Saskatchewan

Classification and Feature Selection via Bayesian t-Probit Model

The purpose of this talk is to introduce our recent work on high-dimensional classification problem with heavy-tailed t-probit model. In genomics studies the sparsity of high dimensional data always intensify the outliers problem, where traditional Gaussian assumption fail and lead to nonrobust classifiers that are vulnerable to type 2 errors. In this talk we propose a hierarchical Bayesian auxiliary model that incorporates heavy-tailed t distribution both for noise and regression parameters. We compare our model with other methods (e.g. logistic regression) and show that one can obtain a robust classifier with heavy-tailed and symmetric t prior.
Oct 13, 1:15-2:15, Arts 105 Prof. Hyun Lim, University of Saskatchewan

Semi-Parametric Additive Hazards Model to Competing Risks Analysis

When subjects possess different demographic and disease characteristics and are exposed to more than one types of failure, a practical problem is to assess covariate effect on each type of failure as well as on all-cause failure. The most widely used method is adopts Cox models on cause-specific or all-cause hazards models. It has been pointed out that this method causes the problem of internal inconsistency. To resolve such problem, the additive hazards model has been advocated as an alternative method. In this talk, both constant and time-varying covariate effects in cause-specific hazard models are specified. We illustrated that the covariate effect on all-cause failure can be estimated by sum of the effects on all competing risks. Using an illustrative example, we show that the proposed method gives simple interpretation of the final results, when the primary covariate effect is constant in the additive manner on all cause-specific hazards. Based on the cause- specific hazard models, estimation of the adjusted overall survival and cumulative incidence curves are presented.
Sep 30, 3:30-4:30, Arts 108 Prof. Peng Zhang, University of Alberta

Efficient estimation for subject-specific effects in longitudinal data using nonnormal linear mixed models

We propose a new class of nonnormal linear mixed models that provide an efficient estimation of subject-specific disease progression in the analysis of longitudinal data from the Modification of Diet in Renal Disease (MDRD) trial. We assume a log-gamma distribution for the random effects and provide the maximum likelihood inference for the proposed nonnormal linear mixed model. This method is extended to model associations among subject-specific effects in a multiple characteristics longitudinal study. More reliable estimates of correlations between random effects are obtained using the log-gamma mixed model.To validate the adequacy of the log-gamma assumption versus the usual normality assumption for the random effects, we propose a lack-of-fit test that clearly indicates a better fit for the log-gamma modeling in the analysis of the MDRD data and the glaucoma study.

Note that: This is a joint talk for Colloquium of Department of Mathematics and Statistics

Tentatively Scheduled Talks