Title page for ETD etd-06162008-171147


Type of Document Master's Thesis
Author Aji, Ashwin Mandayam
Author's Email Address aaji@cs.vt.edu
URN etd-06162008-171147
Title Exploiting Multigrain Parallelism in Pairwise Sequence Search on Emergent CMP Architectures
Degree Master of Science
Department Computer Science
Advisory Committee
Advisor Name Title
Feng, Wu-Chun Committee Chair
Cameron, Kirk W. Committee Member
Nikolopoulos, Dimitrios S. Committee Member
Keywords
  • GPGPU
  • Cell Broadband Engine
  • Smith-Waterman
  • Bioinformatics
Date of Defense 2008-05-30
Availability unrestricted
Abstract
With the emerging hybrid multi-core and many-core compute platforms delivering

unprecedented high performance within a single chip, and making rapid strides toward the

commodity processor market, they are widely expected to replace the

multi-core processors in the existing High-Performance Computing (HPC)

infrastructures, such as large scale clusters, grids and supercomputers.

On the other hand in the realm of bioinformatics, the size of genomic

databases is doubling every 12 months, and hence the need for novel approaches

to parallelize sequence search algorithms has become increasingly important.

This thesis puts a significant step forward in bridging the gap between

software and hardware by presenting an efficient and scalable model to

accelerate one of the popular sequence alignment algorithms by exploiting

multigrain parallelism that is exposed by the emerging multiprocessor architectures.

Specifically, we parallelize a

dynamic programming algorithm called Smith-Waterman both within and across multiple

Cell Broadband Engines and within an nVIDIA GeForce General Purpose Graphics

Processing Unit (GPGPU).

Cell Broadband Engine: We parallelize the Smith-Waterman algorithm

within a Cell node by performing a blocked data decomposition of the

dynamic programming matrix followed by pipelined execution of the blocks across the synergistic

processing elements (SPEs) of the Cell.

We also introduce

novel optimization methods that completely utilize the vector

processing power of the SPE. As a result, we achieve near-linear scalability

or near-constant efficiency for up to 16 SPEs on the dual-Cell QS20 blades, and

our design is highly scalable to more cores, if available.

We further extend this design to accelerate the Smith-Waterman algorithm

across nodes on both the IBM QS20 and the

PlayStation3 Cell cluster platforms and achieve a maximum speedup of 44,

when compared to the execution times on a single Cell node.

We then

introduce an analytical model to accurately estimate the execution times of

parallel sequence alignments and wavefront algorithms in

general on the Cell cluster platforms.

Lastly, we contribute and evaluate TOSS -- a Throughput-Oriented Sequence

Scheduler, which leverages the performance prediction model and

dynamically partitions the available processing elements to

simultaneously align multiple sequences. This scheme succeeds in aligning more sequences per unit

time with an improvement of 33.5% over the

naive first-come, first-serve (FCFS) scheduler.

nVIDIA GPGPU: We parallelize the Smith-Waterman algorithm on the GPGPU

by optimizing the code in stages, which include optimal data

layout strategies, coalesced memory accesses and blocked data decomposition techniques.

Results show that our methods provide a maximum speedup of 3.6 on

the nVIDIA GPGPU when compared to the performance of the naive

implementation of Smith-Waterman.

Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  aji_ms_thesis.pdf 3.06 Mb 00:14:10 00:07:17 00:06:22 00:03:11 00:00:16

Browse All Available ETDs by ( Author | Department )

dla home
etds imagebase journals news ereserve special collections
virgnia tech home contact dla university libraries

If you have questions or technical problems, please Contact DLA.