Variable Categorization and Selection in the Context of Inverse Intensity Weighting for Longitudinal Data

Presenter: Di Shan

Supervisory Committee: Eleanor Pullenayegum (Supervisor), Charlie Keown-Stoneman and Jessica Gronsbell

Date and Time: Thursday, September 28th , 2023 at 11:30am EST

Location: Health Sciences Building (155 College St), Room HS650

Zoom: https://utoronto.zoom.us/j/81894847575

Abstract:

The main objective of this thesis is to address the challenge of analyzing longitudinal data with irregular follow-up times, which may relate to the outcome. While traditional methods like generalized estimating equations (GEEs) have limitations in handling such data, inverse-intensity weighted GEEs (IIW-GEEs) offer a broader range of applications. The selection of variables in the observation intensity model plays a crucial role in ensuring consistent inferences in IIW-GEEs. However, the impact of including different types of variables on the variance of estimated regression parameters is not well-studied. This thesis aims to investigate how variable selection in the observation intensity model affects the variance of IIW-GEE estimates.

By employing mathematical derivation, this thesis demonstrates that the asymptotic variance of the estimated regression parameter remains unchanged when incorporating variables that are predictive of neither the outcome nor the observation intensity. Conversely, the variance decreases when including variables that are predictive of the outcome only, while it increases when adding variables that are predictive of the observation intensity only. To assess the magnitude of these effects, simulations are conducted to determine the extent to which the variance of the estimated regression parameter changes after adding three different types of variables. The simulation results reveal that adding variables predictive of the outcome only to the intensity model decreases the variance, even though sometimes with minimal impact. Conversely, incorporating variables predictive of the observation intensity only increases the variance. Furthermore, the practical application of these findings is demonstrated using data from a randomized trial investigating treatments for major depressive disorder. Although the results indicate no consistent changes, that may be because there are no covariates that are strong predictors of either the outcome alone or the visit process alone.

In conclusion, this thesis provides insights that can assist researchers working with longitudinal data in optimizing their variable selection process when applying IIW-GEEs. By understanding the effects of different types of variables on the variance of estimated regression parameters, researchers can make informed decisions regarding variable inclusion, ultimately benefiting their analyses.

All are welcome!