Title page for ETD etd-07272012-152625


Type of Document Master's Thesis
Author Lee, Kenneth Sydney
Author's Email Address KLee1@vt.edu
URN etd-07272012-152625
Title Characterization and Exploitation of GPU Memory Systems
Degree Master of Science
Department Computer Science and Applications
Advisory Committee
Advisor Name Title
Feng, Wu-Chun Committee Chair
Cao, Yong Committee Member
Lin, Heshan Committee Member
Keywords
  • Data Transfer
  • Performance Modeling
  • GPGPU
  • APU
  • GPU
  • Memory Systems
Date of Defense 2012-07-06
Availability unrestricted
Abstract
Graphics Processing Units (GPUs) are workhorses of modern performance due to their ability to achieve massive speedups on parallel applications. The massive number of threads that can be run concurrently on these systems allow applications which have data-parallel computations to achieve better performance when compared to traditional CPU systems. However, the GPU is not perfect for all types of computation. The massively parallel SIMT architecture of the GPU can still be constraining in terms of achievable performance. GPU-based systems will typically only be able to achieve between 40%-60% of their peak performance. One of the major problems affecting this effeciency is the GPU memory system, which is tailored to the needs of graphics workloads instead of general-purpose computation.

This thesis intends to show the importance of memory optimizations for GPU systems. In particular, this work addresses problems of data transfer and global atomic memory contention. Using the novel AMD Fusion architecture, we gain overall performance improvements over discrete GPU systems for data-intensive applications. The fused architecture systems offer an interesting trade off by increasing data transfer rates at the cost of some raw computational power. We characterize the performance of different memory paths that are possible because of the shared memory space present on the fused architecture. In addition, we provide a theoretical model which can be used to correctly predict the comparative performance of memory movement techniques for a given data-intensive application and system. In terms of global atomic memory contention, we show improvements in scalability and performance for global synchronization primitives by avoiding contentious global atomic memory accesses. In general, this work shows the importance of understanding the memory system of the GPU architecture to achieve better application performance.

Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  Lee_KS_T_2012.pdf 4.11 Mb 00:19:02 00:09:47 00:08:34 00:04:17 00:00:21

Browse All Available ETDs by ( Author | Department )

dla home
etds imagebase journals news ereserve special collections
virgnia tech home contact dla university libraries

If you have questions or technical problems, please Contact DLA.