Scholarly
    Communications Project


Document Type:Dissertation
Name:Shin Cheng Wang
Email address:dswang@vt.edu
URN:1998/00241
Title:ANALYSIS OF ZERO-HEAVY DATA USING A MIXTURE MODEL APPROACH
Degree:Doctor of Philosophy
Department:Statistics
Committee Chair: Eric P. Smith
Chair's email:epsmith@vt.edu
Committee Members:Jesse C. Arnold
Clint W. Coakley
Klaus H. Hinkelmann
Keying Ye
Keywords:Ceriodaphnia Dubia, Chronic toxicity testing, Generalized Estimating Equations, Inhibition Concentration, Longitudinal Data, Principal Component Analysis, Zero-inflated Poisson
Date of defense:March 18, 1998
Availability:Release the entire work for Virginia Tech access only.
After one year release worldwide only with written permission of the student and the advisory committee chair.

Abstract:

The problem of high proportion of zeroes has long been an interest in data analysis and modeling, however, there are no unique solutions to this problem. The solution to the individual problem really depends on its particular situation and the design of the experiment. For example, different biological, chemical, or physical processes may follow different distributions and behave differently. Different mechanisms may generate the zeroes and require different modeling approaches. So it would be quite impossible and inflexible to come up with a unique or a general solution.

In this dissertation, I focus on cases where zeroes are produced by mechanisms that create distinct sub-populations of zeroes. The dissertation is motivated from problems of chronic toxicity testing which has a data set that contains a high proportion of zeroes. The analysis of chronic test data is complicated because there are two different sources of zeroes: mortality and non-reproduction in the data. So researchers have to separate zeroes from mortality and fecundity. The use of mixture model approach which combines the two mechanisms to model the data here is appropriate because it can incorporate the mortality kind of extra zeroes.

A zero inflated Poisson (ZIP) model is used for modeling the fecundity in Ceriodaphnia dubia toxicity test. A generalized estimating equation (GEE) based ZIP model is developed to handle longitudinal data with zeroes due to mortality. A joint estimate of inhibition concentration (ICx) is also developed as potency estimation based on the mixture model approach. It is found that the ZIP model would perform better than the regular Poisson model if the mortality is high. This kind of toxicity testing also involves longitudinal data where the same subject is measured for a period of seven days. The GEE model allows the flexiblity to incorporate the extra zeroes and a correlation structure among the repeated measures.

The problem of zero-heavy data also exists in environmental studies in which the growth or reproduction rates of multi-species are measured. This gives rise to multivariate data. Since the inter-relationships between different species are imbedded in the correlation structure, the study of the information in the correlation of the variables, which is often accessed through principal component analysis, is one of the major interests in multi-variate data. In the case where mortality influences the variables of interests, but mortality is not the subject of interests, the use of the mixture approach can be applied to recover the information of the correlation structure. In order to investigate the effect of zeroes on multi-variate data, simulation studies on principal component analysis are performed. A method that recovers the information of the correlation structure is also presented.


List of Attached Files

Bibliography.pdf Fig3-1.pdf Fig3-10.pdf
Fig3-2.pdf Fig3-3.pdf Fig3-4.pdf
Fig3-5.pdf Fig3-6.pdf Fig3-7.pdf
Fig3-8.pdf Fig3-9.pdf Fig4-1.pdf
abs.pdf appendix.pdf ch1.pdf
ch2.pdf ch3.pdf ch4.pdf
ch5.pdf ch6.pdf vita.pdf

At the author's request, all materials (PDF files, images, etc.) associated with this ETD are accessible from the Virginia Tech network only.


The author grants to Virginia Tech or its agents the right to archive and display their thesis or dissertation in whole or in part in the University Libraries in all forms of media, now or hereafter known. The author retains all proprietary rights, such as patent rights. The author also retains the right to use in future works (such as articles or books) all or part of this thesis or dissertation.