

Type of Document Dissertation Author Tadepalli, Sriram Satish Author's Email Address stadepal@vt.edu URN etd-02202009-080341 Title Schemas of Clustering Degree PhD Department Computer Science Advisory Committee
Advisor Name Title Dr. Naren Rama Krishnan Committee Chair Dr. Layne Watson Committee Member Dr. Liqing Zhang Committee Member Dr. Richard Helm Committee Member Dr. T. M. Murali Committee Member Keywords
- relational clustering
- Clustering
- multi-criteria optimization
- bioinformatics
- contingency tables
Date of Defense 2009-01-29 Availability unrestricted Abstract Data mining techniques, such as clustering, have become a mainstay in many applicationssuch as bioinformatics, geographic information systems, and marketing. Over the last decade,
due to new demands posed by these applications, clustering techniques have been significantly
adapted and extended. One such extension is the idea of finding clusters in a dataset that
preserve information about some auxiliary variable. These approaches tend to guide the clustering
algorithms that are traditionally unsupervised learning techniques with the background
knowledge of the auxiliary variable. The auxiliary information could be some prior class label
attached to the data samples or it could be the relations between data samples across different
datasets. In this dissertation, we consider the latter problem of simultaneously clustering
several vector valued datasets by taking into account the relationships between the data samples.
We formulate objective functions that can be used to find clusters that are local in each
individual dataset and at the same time maximally similar or dissimilar with respect to clusters
across datasets. We introduce diverse applications of these clustering algorithms: (1) time
series segmentation (2) reconstructing temporal models from time series segmentations (3) simultaneously
clustering several datasets according to database schemas using a multi-criteria
optimization and (4) clustering datasets with many-many relationships between data samples.
For each of the above, we demonstrate applications, including modeling the yeast cell cycle
and the yeast metabolic cycle, understanding the temporal relationships between yeast biological
processes, and cross-genomic studies involving multiple organisms and multiple stresses.
The key contribution is to structure the design of complex clustering algorithms over a database
schema in terms of clustering algorithms over the underlying entity sets.
Files
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access thesis.pdf 8.16 Mb 00:37:47 00:19:26 00:17:00 00:08:30 00:00:43
If you have questions or technical problems, please Contact DLA.