Title page for ETD etd-02202009-080341


Type of Document Dissertation
Author Tadepalli, Sriram Satish
Author's Email Address stadepal@vt.edu
URN etd-02202009-080341
Title Schemas of Clustering
Degree PhD
Department Computer Science
Advisory Committee
Advisor Name Title
Dr. Naren Rama Krishnan Committee Chair
Dr. Layne Watson Committee Member
Dr. Liqing Zhang Committee Member
Dr. Richard Helm Committee Member
Dr. T. M. Murali Committee Member
Keywords
  • relational clustering
  • Clustering
  • multi-criteria optimization
  • bioinformatics
  • contingency tables
Date of Defense 2009-01-29
Availability unrestricted
Abstract
Data mining techniques, such as clustering, have become a mainstay in many applications

such as bioinformatics, geographic information systems, and marketing. Over the last decade,

due to new demands posed by these applications, clustering techniques have been significantly

adapted and extended. One such extension is the idea of finding clusters in a dataset that

preserve information about some auxiliary variable. These approaches tend to guide the clustering

algorithms that are traditionally unsupervised learning techniques with the background

knowledge of the auxiliary variable. The auxiliary information could be some prior class label

attached to the data samples or it could be the relations between data samples across different

datasets. In this dissertation, we consider the latter problem of simultaneously clustering

several vector valued datasets by taking into account the relationships between the data samples.

We formulate objective functions that can be used to find clusters that are local in each

individual dataset and at the same time maximally similar or dissimilar with respect to clusters

across datasets. We introduce diverse applications of these clustering algorithms: (1) time

series segmentation (2) reconstructing temporal models from time series segmentations (3) simultaneously

clustering several datasets according to database schemas using a multi-criteria

optimization and (4) clustering datasets with many-many relationships between data samples.

For each of the above, we demonstrate applications, including modeling the yeast cell cycle

and the yeast metabolic cycle, understanding the temporal relationships between yeast biological

processes, and cross-genomic studies involving multiple organisms and multiple stresses.

The key contribution is to structure the design of complex clustering algorithms over a database

schema in terms of clustering algorithms over the underlying entity sets.

Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  thesis.pdf 8.16 Mb 00:37:47 00:19:26 00:17:00 00:08:30 00:00:43

Browse All Available ETDs by ( Author | Department )

dla home
etds imagebase journals news ereserve special collections
virgnia tech home contact dla university libraries

If you have questions or technical problems, please Contact DLA.