Projects in Progress
|
Integrating ground monitoring data and satellite observations to estimate PM2.5 concentrations in the eastern United States, 2000-2006 |
|
Project description: I have funding from the Health Effects Institute to estimate monthly PM2.5 concentrations in the eastern United States for the period 2000-2006. The basic notion is that state-of-the-art estimates for a large area over a long time period could be used in a variety of epidemiological studies. We will use satellite observations of aerosol optical depth (AOD) from the MODIS, MISR, and GOES satellite AOD instruments to supplement the ground network and provide information in areas far from monitors. The monitors will serve as ground truth and will provide higher resolution in urban areas. The disparate sources of information will be integrated via a Bayesian spatial-temporal statistical model. Initial results indicate that the satellite proxies may not be as useful as many have hoped. This appears to be caused by high levels of missing AOD retrievals, high levels of uncertainty in AOD as a proxy for PM2.5, and spatially-varying bias in the relationship between AOD and PM2.5. I have a draft manuscript nearing submission on this work. As a pilot project within this funding mechanism, with Yang Liu, Hortensia Moreno-Macias and Shobha Kondragunta, I investigated the usefulness of the GOES satellite measurements of AOD to improve PM2.5 concentration estimates. GOES is of great interest because retrievals for 1995-1998 could be processed, filling in a gap in our estimation of PM2.5 because the ground monitoring network was very sparse before 1999. In addition, GOES provides data at higher spatial and temporal resolution than MODIS and MISR. However there has been limited validation of GOES AOD to this point. A manuscript (Paciorek et al. (2008) ) is in press in Environmental Science and Technology. Results indicate that GOES retrievals are reasonably well-correlated with PM2.5 and may be particularly useful because of high data density in time, compared to MODIS and MISR, but there is little indication that the high nominal spatial resolution is very helpful. |
|
Collaborators: Yang Liu (Harvard Environmental Health), Shobha Kondragunta (NOAA-NESDIS) |
|
Bias in regression models with spatial confounding |
|
Project description: It is common in environmental applications and other areas with spatial data that residuals in regression models are spatially correlated. Researchers often account for spatial structure in the residuals using a spatial term in the mean or a spatial covariance structure. I have a draft manuscript in which I develop a simple framework for understanding bias and precision in such spatial regression models. If the covariate(s) are spatially correlated, then the residuals may be correlated with the covariate(s). In this context, the correlation may be due to an unmeasured confounder and one may hope to account for confounding by including spatial structure in the model. Results indicate that the scales of spatial variation are critical and that bias from unmeasured confounders is reduced only if the confounder varies at a scale larger than the spatial variation in the covariate of interest (or of a component of the covariate). Furthermore, random effects models, kriging, and other forms of penalized models reduce bias less than fixed effects models that remove spatial variability up to pre-specified scales. This is because of the inherent bias-variance tradeoff in penalized models. Ongoing work will explore effects of measurement error in this context as well as spatial correlation in the context of areal spatial data and standard conditional auto-regressive (CAR) models. |
|
Smoothing methods for effects of emissions sources, wind, and buffer-type variables in environmental exposure modeling |
|
Project description: Researchers analyzing exposure to pollutants often face the problem of accounting for the effects of known and unknown sources, mediated by distance, direction, emissions strength, and wind, among other factors. I am working on relatively simple statistical smoothing methods in the context of regression modeling to better account for distance and direction from source to receptor (exposure location) and wind direction and speed. As part of this I have developed a simple approach to estimate the smooth effect of distance to emissions sources, accounting for distance, source strength and multiple sources. This work is motivated by a variety of studies here at HSPH in the Environmental Health department, including air pollution exposure in Brooklyn from two major highways and at TF Green Airport in Providence, exposure in eastern Massachusetts in conjunction with multiple HSPH health studies, as well as large-scale estimation of PM exposure as part of my HEI-funded work and as part of the NHS exposure project. |
|
Collaborators: Len Zwack and Jon Levy (Harvard Environmental Health), Brent Coull (Harvard Biostatistics) |
Projects with Submitted Manuscripts
|
Post-glacial tree dynamics in New England |
|
Project description: We are interested in understanding tree population dynamics over the last 15000 years. To investigate dynamics based on pollen deposited in pond sediments over the last 2000 years in south-central New England, we have built a Bayesian hierarchical model to relate fossil pollen from sediment cores to tree populations, calibrated by modern tree plots and colonial records. The model is run in predictive mode to estimate composition back in time when only pollen data are available. The model will be used to consider changes in tree populations over time and will eventually be linked to molecular data that provide additional information about population spread. We have completed an extensive tech report about the project (Paciorek and McLachlan). We are in the process of revising and resubmitting to JASA Applications and Case Studies. |
|
Collaborators: Jason McLachlan (Notre Dame Biology), Wyatt Oswald (Harvard Forest), Aaron Ellison (Harvard Forest) |
|
Spatio-temporal estimation of particulate matter exposure in the Nurses Health Study |
|
Project description: We have been investigating the health effects of particulate matter air pollution effects in one of the major cohort studies, the Nurses' Health Study. Our part of the project was to estimate individual exposure to particulate matter. We have built a spatio-temporal model to estimate monthly exposure to PM10 and PM2.5 for 1988-2002 in the northeast U.S. using government monitoring data and GIS covariates. As only sparse PM2.5 data are available before 1999, we are estimating PM2.5 based on PM10 measurements and visibility information. As part of this latter effort, we have devised a method for using airport visibility data to estimate PM2.5 while properly accounting for the uncertainty and truncation in the visibility data. We are in the process of revising a manuscript (Paciorek, Yanosky, and Suh ) for the Annals of Applied Statistics. Scientific manuscipts have been accepted in Atmospheric Environment (Yanosky et al., 2008) and under revision for American Journal of Epidemiology (Puett et al.), with an additional manuscript submitted to Environmental Health Perspectives (Yanosky et al.). |
|
Collaborators: primarily Jeff Yanosky and Helen Suh (Harvard Environmental Health) on the exposure modeling side of the larger project |
|
Measurement error induced by spatial exposure estimation |
|
Project description: Motivated in part by the exposure estimation work on the Nurses' Health study and collaborative work of Alexandros and Brent, we are interested in the measurement error problem induced when spatial exposure estimates are used in epidemiological models of health outcomes based on cohort data. We have developed a framework for thinking about the problem and argue that some standard approaches in the literature are flawed. We suggest and investigate the performance of several alternative approaches to adjusting for measurement error in the epidemiological models. We are revising a manuscript (Gryparis et al.) for Biostatistics on this work. |
|
Collaborators: Alexandros Gryparis, Brent Coull (both with Harvard Biostatistics) |
Completed Projects
|
Fourier basis representation for spatial data |
|
Project description: This work builds on my work on spatial logistic regression for large datasets. The Fourier basis representation of spatial processes is an efficient way to represent Gaussian spatial processes on a fine grid with substantial computational advantages. This approach has been pioneered by Chris Wikle at the University of Missouri. I have explored various parameterizations and MCMC algorithms based on the Fourier representation and have written an article (Paciorek 2007b) and template R code published in Journal of Statistical Software. |
|
Fitting spatial models for large datasets with binary outcomes |
|
Project description: Epidemiological researchers are interested in models for binary data that treat space as a covariate. These models tend to be hard to fit, because one is fitting the space covariate nonparametrically in two dimensions. I have compared several approaches to fitting such data, including Bayesian Gaussian process models, penalized quasi-likelihood, and Simon Wood's multipenalty spline optimization coded in the mgcv library for R. The Bayesian approaches rely on efficient representations of the underlying spatial risk surface. In particular, I have modified Chris Wikle's work with the Fourier basis, which allows use of the FFT to speed computation, and Kammann and Wand's (2003) work with basis function representation approximations to standard covariance functions. The Fourier basis outperforms other methods in simulations, fitting better than both penalized likelihood and Bayesian methods, and computing more quickly than other Bayesian methods. See the preprint of the paper (Paciorek 2007a), published in Computational Statistics and Data Analysis. |
|
Collaborator: Louise Ryan (Harvard Biostatistics) |
|
A new class of nonstationary covariances for spatial data |
|
Project description: I have a paper in Environmetrics (Paciorek and Schervish 2006) on the spatial modelling in my dissertation. In particular, I use a nonstationary covariance that generalizes the work of Higdon, Swall and Kern (1998). In contrast to my dissertation, this works uses efficient representations mentioned in the previous paragraph to increase the speed of fitting the nonstationary Gaussian process. The efficient representations are used for stationary processes in the hierarchy of the model that determine the structure of the nonstationarity. |
|
Collaborator: Mark Schervish (CMU Statistics) |
|
Misinformation in the conjugate prior for the linear model |
|
Project description: I wrote a short paper (Paciorek 2006) for Bayesian Analysis on the conjugate prior for the normal linear model. When the prior mean in this model is poorly chosen, the resulting posterior for the error variance and the posterior variance for the coefficients can be inflated. This is of particular concern with the unit information prior and free-knot spline models (e.g., BARS) that use the unit information prior (in fact the paper grew out of anomalous results I obtained when using BARS in my thesis work). |
|
Effects of elevated wild pig abundance on tropical forest plant demography |
|
Project description: With Kalan Ickes (and the hard-earned data from his doctoral research in ecology at LSU), I worked on an analysis of the effects of elevated wild pig populations on the demographics of the woody plant community at the Pasoh Forest Dynamics Plot in peninsular Malaysia. This analysis (Ickes, Paciorek, and Thomas 2005) has been published in Ecology. |
|
Collaborators: Kalan Ickes (Clemson Biology), Sean Thomas (Toronto Forestry) |
|
Ph.D. dissertation on nonstationary covariance modelling |
|
Project description: I finished my PhD dissertation in May 2003. The dissertation focused on nonstationary covariance modelling for spatial data and nonparametric regression. Details are here. |
|
Collaborator: Mark Schervish (CMU Statistics) |
|
False discovery rate testing for spatial data |
|
Project description: We wrote a paper (Ventura, Paciorek, and Risbey 2004) overviewing the use of the false discovery rate (FDR) methodology, first introduced by Benjamini and Hochberg (1995), for multiple testing with climatological and geophysical data in which the multiple tests are done at multiple spatial locations. The spatial aspect introduces dependence between the tests. We assessed the robustness and power of several FDR approaches and concluded that the simple original Benjamini and Hochberg algorithm works well with spatial data. We also present a simple modification that substantially increases power when there are many significant locations. An earlier technical report version is also available. |
|
Collaborators: Valerie Ventura (CMU Statistics), James Risbey (CSIRO Tasmania) |
|
Trends in storminess in the northern hemisphere based on multiple indicators |
|
Project description: For my applied research qualifier at CMU, I completed an analysis (Paciorek, Risbey, Ventura, and Rosen 2002), published in Journal of Climate, of storminess in the Northern Hemisphere over the past 50 years using the NCEP/NCAR reanalysis dataset. The goals were 1.) to compare different indices of storminess, and 2.) to investigate trends in storminess over time. The maps in the paper were created with a package of UNIX mapping tools called GMT. Data files of the storm indices we use in this analysis are available here. |
|
Collaborators: James Risbey (CSIRO Tasmania), Valerie Ventura (CMU Statistics), Richard Rosen (NOAA) |
|
Statistical language modelling |
|
Project description: We built exponential language models based on features of whole sentences. The goal was to rerank n-best lists produced by an initial acoustic/n-gram model. An early approach to this problem, presented at the 2000 Speech Transcription Workshop, was applied to Switchboard conversational speech with little success. One way of re-ranking n-best lists is via Powell's algorithm. I devised a local regression approach to reranking n-best lists, but this approach has not been successful. |
|
Collaborator: Roni Rosenfeld (CMU Computer Science) |
|
Tree resprouting on Barro Colorado Island |
|
Project description: For my master's thesis in ecology, I analyzed resprouting in trees and shrubs on Barro Colorado Island, Panama. This work (Paciorek, Condit, Hubbell, and Foster 2000) has been published in Journal of Ecology. |
|
Collaborator: Rick Condit (Smithsonian Tropical Research Institute, Panama) |
Last updated: June 2008