

Type of Document Dissertation Author Jin, Ying URN etd-12302009-142944 Title New Algorithms for Mining Network Datasets: Applications to Phenotype and Pathway Modeling Degree PhD Department Computer Science Advisory Committee
Advisor Name Title Ramakrishnan, Naren Committee Chair Fox, Edward Alan Committee Member Heath, Lenwood S. Committee Member Helm, Richard Frederick Committee Member Murali, T. M. Committee Member Keywords
- partial orders
- biclusters
- graph separators
- relative importance methods
- Biological networks
Date of Defense 2009-12-08 Availability restricted Abstract Biological network data is plentiful with practically every experimental methodology giving ‘networkviews’ into cellular function and behavior. Bioinformatic screens that yield network data
include, for example, genome-wide deletion screens, protein-protein interaction assays, RNA interference
experiments, and methods to probe metabolic pathways. Efficient and comprehensive
computational approaches are required to model these screens and gain insight into the nature of biological
networks. This thesis presents three new algorithms to model and mine network datasets.
First, we present an algorithm that models genome-wide perturbation screens by deriving relations
between phenotypes and subsequently using these relations in a local manner to derive genephenotype
relationships. We show how this algorithm outperforms all previously described algorithms
for gene-phenotype modeling. We also present theoretical insight into the convergence and
accuracy properties of this approach. Second, we define a new data mining problem—constrained
minimal separator mining—and propose algorithms as well as applications to modeling gene perturbation
screens by viewing the perturbed genes as a graph separator. Both of these data mining
applications are evaluated on network datasets from S. cerevisiae and C. elegans. Finally, we
present an approach to model the relationship between metabolic pathways and operon structure in
prokaryotic genomes. In this approach, we present a new pattern class—biclusters over domains
with supplied partial orders—and present algorithms for systematically detecting such biclusters.
Together, our data mining algorithms provide a comprehensive arsenal of techniques for modeling
gene perturbation screens and metabolic pathways.
Files
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access Jin_Ying_D_2009.pdf 1.59 Mb 00:07:21 00:03:47 00:03:18 00:01:39 00:00:08 indicates that a file or directory is accessible from the Virginia Tech campus network only.
If you have questions or technical problems, please Contact DLA.