Title page for ETD etd-06192012-223659


Type of Document Dissertation
Author Hossain, Mahmud Shahriar
Author's Email Address msh@vt.edu
URN etd-06192012-223659
Title Exploratory Data Analysis using Clusters and Stories
Degree PhD
Department Computer Science
Advisory Committee
Advisor Name Title
Ramakrishnan, Naren Committee Chair
Davidson, Ian Committee Member
Fox, Edward Alan Committee Member
North, Christopher L. Committee Member
Watson, Layne T. Committee Member
Keywords
  • Alternative clustering
  • Guided clustering
  • Storytelling
  • Connecting the dots
Date of Defense 2012-06-08
Availability unrestricted
Abstract
Exploratory data analysis aims to study datasets through the use of iterative, investigative, and visual analytic algorithms. Due to the difficulty in managing and accessing the growing volume of unstructured data, exploratory analysis of datasets has become harder than ever and an interest to data mining researchers. In this dissertation, we study new algorithms for exploratory analysis of data collections using clusters and stories. Clustering brings together similar entities whereas stories connect dissimilar objects. The former helps organize datasets into regions of interest, and the latter explores latent information by connecting the dots between disjoint instances. This dissertation specifically focuses on five different research aspects to demonstrate the applicability and usefulness of clusters and stories as exploratory data analysis tools. In the area of clustering, we investigate whether clustering algorithms can be automatically "alternatized" and how they can be guided to obtain alternative results using flexible constraints as "scatter-gather" operations. We demonstrate the application of these ideas in many application domains, including studying the bat biosonar system and designing sustainable products. In the area of storytelling, we develop algorithms that can generate stories using distance, clique, and syntactic constraints. We explore the use of storytelling for studying document collections in the biomedical literature and intelligence analysis domain.
Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  Hossain_MS_D_2012.pdf 16.60 Mb 01:16:50 00:39:31 00:34:34 00:17:17 00:01:28

Browse All Available ETDs by ( Author | Department )

dla home
etds imagebase journals news ereserve special collections
virgnia tech home contact dla university libraries

If you have questions or technical problems, please Contact DLA.