

Type of Document Dissertation Author Kumar, Deept Author's Email Address dkumar@vt.edu URN etd-05032007-223232 Title Redescription Mining: Algorithms and Applications in Bioinformatics Degree PhD Department Computer Science Advisory Committee
Advisor Name Title Ramakrishnan, Naren Committee Chair Helm, Richard Frederick Committee Member Murali, T. M. Committee Member North, Christopher L. Committee Member Potts, Malcolm Committee Member Keywords
- bioinformatics
- storytelling
- redescription mining
- redescriptions
Date of Defense 2007-04-19 Availability unrestricted Abstract Scientific data mining purports to extract useful knowledge from massive datasets curatedthrough computational science efforts, e.g., in bioinformatics, cosmology, geographic sciences,
and computational chemistry. In the recent past, we have witnessed major transformations
of these applied sciences into data-driven endeavors. In particular, scientists are now
faced with an overload of vocabularies for describing domain entities. All of these vocabularies
offer alternative and mostly complementary (sometimes, even contradictory) ways to
organize information and each vocabulary provides a different perspective into the problem
being studied. To further knowledge discovery, computational scientists need tools to help
uniformly reason across vocabularies, integrate multiple forms of characterizing datasets, and
situate knowledge gained from one study in terms of others.
This dissertation defines a new pattern class called redescriptions that provides high level capabilities
for reasoning across domain vocabularies. A redescription is a shift of vocabulary, or
a different way of communicating the same information; redescription mining finds concerted
sets of objects that can be defined in (at least) two ways using given descriptors. We present
the CARTwheels algorithm for mining redescriptions by exploiting equivalences of partitions
induced by distinct descriptor classes as well as applications of CARTwheels to several bioinformatics
datasets. We then outline how we can build more complex data mining operations
by cascading redescriptions to realize a story, leading to a new data mining capability called
storytelling. Besides applications to characterizing gene sets, we showcase its uses in other
datasets as well. Finally, we extend the core CARTwheels algorithm by introducing a theoretical
framework, based on partitions, to systematically explore redescription space; generalizing
from mining redescriptions (and stories) within a single domain to relating descriptors across
different domains, to support complex relational data mining scenarios; and exploiting structure
of the underlying descriptor space to yield more effective algorithms for specific classes
of datasets.
Files
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access deept_redescs.pdf 2.87 Mb 00:13:16 00:06:49 00:05:58 00:02:59 00:00:15
If you have questions or technical problems, please Contact DLA.