

Type of Document Master's Thesis Author Goddard, Christopher Joseph Author's Email Address cgoddard@vt.edu URN etd-09192007-155445 Title Analysis and Abstraction of Parallel Sequence Search Degree Master of Science Department Computer Science Advisory Committee
Advisor Name Title Feng, Wu-Chun Committee Chair Back, Godmar V. Committee Member Tilevich, Eli Committee Member Keywords
- sequence search
- BLAST
- parallelism
- grid framework
Date of Defense 2007-09-05 Availability unrestricted Abstract The ability to compare two biological sequences is extremely valuable, as matches can suggest evolutionary origins of genes or the purposes of particular amino acids. Results of such comparisons can be used in the creation of drugs, can help combat newly discovered viruses, or can assist in treating diseases.
Unfortunately, the rate of sequence acquisition is outpacing our ability to compute on these data. Further, traditional dynamic programming algorithms are too slow to meet the needs of biologists, who wish to compare millions of sequences daily. While heuristic algorithms improve upon the performance of these dated applications, they still cannot keep up with the steadily expanding search space.
Parallel sequence search implementations were developed to address this issue. By partitioning databases into work units for distributed computation, applications like mpiBLAST are able to achieve super-linear speedup over their sequential counterparts. However, such implementations are limited to clusters and require significant effort to work in a grid environment. Further, their parallelization strategies are typically specific to the target sequence search, so future applications require a reimplementation if they wish to run in parallel.
This thesis analyzes the performance of two versions of mpiBLAST, noting trends as well as differences between them. Results suggest that these embarrassingly parallel applications are dominated by the time required to search vast amounts of data, and not by the communication necessary to support such searches. Consequently, a framework named gridRuby is introduced which alleviates two main issues with current parallel sequence search applications; namely, the requirement of a tightly knit computing environment and the specific, hand-crafted nature of parallelization. Results show that gridRuby can parallelize an application across a set of machines through minimal implementation effort, and can still exhibit super-linear speedup.
Files
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access Thesis.pdf 684.84 Kb 00:03:10 00:01:37 00:01:25 00:00:42 00:00:03
If you have questions or technical problems, please Contact DLA.