

Type of Document Master's Thesis Author Aji, Ashwin Mandayam Author's Email Address aaji@cs.vt.edu URN etd-06162008-171147 Title Exploiting Multigrain Parallelism in Pairwise Sequence Search on Emergent CMP Architectures Degree Master of Science Department Computer Science Advisory Committee
Advisor Name Title Feng, Wu-Chun Committee Chair Cameron, Kirk W. Committee Member Nikolopoulos, Dimitrios S. Committee Member Keywords
- GPGPU
- Cell Broadband Engine
- Smith-Waterman
- Bioinformatics
Date of Defense 2008-05-30 Availability unrestricted Abstract With the emerging hybrid multi-core and many-core compute platforms deliveringunprecedented high performance within a single chip, and making rapid strides toward the
commodity processor market, they are widely expected to replace the
multi-core processors in the existing High-Performance Computing (HPC)
infrastructures, such as large scale clusters, grids and supercomputers.
On the other hand in the realm of bioinformatics, the size of genomic
databases is doubling every 12 months, and hence the need for novel approaches
to parallelize sequence search algorithms has become increasingly important.
This thesis puts a significant step forward in bridging the gap between
software and hardware by presenting an efficient and scalable model to
accelerate one of the popular sequence alignment algorithms by exploiting
multigrain parallelism that is exposed by the emerging multiprocessor architectures.
Specifically, we parallelize a
dynamic programming algorithm called Smith-Waterman both within and across multiple
Cell Broadband Engines and within an nVIDIA GeForce General Purpose Graphics
Processing Unit (GPGPU).
Cell Broadband Engine: We parallelize the Smith-Waterman algorithm
within a Cell node by performing a blocked data decomposition of the
dynamic programming matrix followed by pipelined execution of the blocks across the synergistic
processing elements (SPEs) of the Cell.
We also introduce
novel optimization methods that completely utilize the vector
processing power of the SPE. As a result, we achieve near-linear scalability
or near-constant efficiency for up to 16 SPEs on the dual-Cell QS20 blades, and
our design is highly scalable to more cores, if available.
We further extend this design to accelerate the Smith-Waterman algorithm
across nodes on both the IBM QS20 and the
PlayStation3 Cell cluster platforms and achieve a maximum speedup of 44,
when compared to the execution times on a single Cell node.
We then
introduce an analytical model to accurately estimate the execution times of
parallel sequence alignments and wavefront algorithms in
general on the Cell cluster platforms.
Lastly, we contribute and evaluate TOSS -- a Throughput-Oriented Sequence
Scheduler, which leverages the performance prediction model and
dynamically partitions the available processing elements to
simultaneously align multiple sequences. This scheme succeeds in aligning more sequences per unit
time with an improvement of 33.5% over the
naive first-come, first-serve (FCFS) scheduler.
nVIDIA GPGPU: We parallelize the Smith-Waterman algorithm on the GPGPU
by optimizing the code in stages, which include optimal data
layout strategies, coalesced memory accesses and blocked data decomposition techniques.
Results show that our methods provide a maximum speedup of 3.6 on
the nVIDIA GPGPU when compared to the performance of the naive
implementation of Smith-Waterman.
Files
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access aji_ms_thesis.pdf 3.06 Mb 00:14:10 00:07:17 00:06:22 00:03:11 00:00:16
If you have questions or technical problems, please Contact DLA.