Title page for ETD etd-08072012-143006


Type of Document Dissertation
Author Maiti, Dipayan
Author's Email Address dipayanm@vt.edu
URN etd-08072012-143006
Title Multiset Model Selection and Averaging, and Interactive Storytelling
Degree PhD
Department Statistics
Advisory Committee
Advisor Name Title
Leman, Scotland C. Committee Chair
House, Leanna L. Committee Member
Kim, Inyoung Committee Member
North, Christopher L. Committee Member
Ramakrishnan, Naren Committee Member
Smith, Eric P. Committee Member
Keywords
  • supervised topic modeling
  • visual analytics
  • bayesian model averaging
  • Bayesian mode selection
Date of Defense 2012-07-23
Availability unrestricted
Abstract
The Multiset Sampler [Leman et al., 2009] has previously been deployed and developed for efficient sampling from complex stochastic processes. We extend the sampler and the surrounding theory to model selection problems. In such problems efficient exploration of the model space becomes a challenge since independent and ad-hoc proposals might not be able to jointly propose multiple parameter sets which correctly explain a new pro- posed model. In order to overcome this we propose a multiset on the model space to en- able efficient exploration of multiple model modes with almost no tuning. The Multiset Model Selection (MSMS) framework is based on independent priors for the parameters and model indicators on variables. We show that posterior model probabilities can be easily obtained from multiset averaged posterior model probabilities in MSMS. We also obtain typical Bayesian model averaged estimates for the parameters from MSMS. We apply our algorithm to linear regression where it allows easy moves between parame- ter modes of different models, and in probit regression where it allows jumps between widely varying model specific covariance structures in the latent space of a hierarchical model.

The Storytelling algorithm [Kumar et al., 2006] constructs stories by discovering and con- necting latent connections between documents in a network. Such automated algorithms often do not agree with user’s mental map of the data. Hence systems that incorporate feedback through visual interaction from the user are of immediate importance. We pro- pose a visual analytic framework in which such interactions are naturally incorporated in to the existing Storytelling algorithm through a redefinition of the latent topic space used in the similarity measure of the network. The document network can be explored us- ing the newly learned normalized topic weights for each document. Hence our algorithm augments the limitations of human sensemaking capabilities in large document networks by providing a collaborative framework between the underlying model and the user. Our formulation of the problem is a supervised topic modeling problem where the supervi- sion is based on relationships imposed by the user as a set of inequalities derived from tolerances on edge costs from inverse shortest path problem. We show a probabilistic modeling of the relationships based on auxiliary variables and propose a Gibbs sampling based strategy. We provide detailed results from a simulated data and the Atlantic Storm data set.

Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  Maiti_D_D_2012.pdf 1.61 Mb 00:07:27 00:03:49 00:03:21 00:01:40 00:00:08

Browse All Available ETDs by ( Author | Department )

dla home
etds imagebase journals news ereserve special collections
virgnia tech home contact dla university libraries

If you have questions or technical problems, please Contact DLA.