- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- GPU compute memory systems
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
GPU compute memory systems Yuan, George Lai
Abstract
Modern Graphic Process Units (GPUs) offer orders of magnitude more raw computing power than contemporary CPUs by using many simpler in-order single-instruction, multiple-data (SIMD) cores optimized for multi-thread performance rather than single-thread performance. As such, GPUs operate much closer to the "Memory Wall", thus requiring much more careful memory management. This thesis proposes changes to the memory system of our detailed GPU performance simulator, GPGPU-Sim, to allow proper simulation of general-purpose applications written using NVIDIA's Compute Unified Device Architecture (CUDA) framework. To test these changes, fourteen CUDA applications with varying degrees of memory intensity were collected. With these changes, we show that our simulator predicts performance of commodity GPU hardware with 86% correlation. Furthermore, we show that increasing chip resources to allow more threads to run concurrently does not necessarily increase performance due to increased contention for the shared memory system. Moreover, this thesis proposes a hybrid analytical DRAM performance model that uses memory address traces to predict the efficiency of a DRAM system when using a conventional First-Ready First-Come First-Serve (FR-FCFS) memory scheduling policy. To stress the proposed model, a massively multithreaded architecture based upon contemporary high-end GPUs is simulated to generate the memory address trace needed. The results show that the hybrid analytical model predicts DRAM efficiency to within 11.2% absolute error when arithmetically averaged across a memory-intensive subset of the CUDA applications introduced in the first part of this thesis. Finally, this thesis proposes a complexity-effective solution to memory scheduling that recovers most of the performance loss incurred by a naive in-order First-in First-out (FIFO) DRAM scheduler compared to an aggressive out-of-order FR-FCFS scheduler. While FR-FCFS scheduling re-orders memory requests to improve row access locality, we instead employ an interconnection network arbitration scheme that preserves the inherently high row access locality of memory request streams from individual "shader cores" and, in doing so, achieve DRAM efficiency and system performance close to that of FR-FCFS with a simpler design. We evaluate our interconnection network arbitration scheme using crossbar, ring, and mesh networks and show that, when coupled with a banked FIFO in-order scheduler, it obtains up to 91.0% of the performance obtainable with an out-of-order memory scheduler with eight-entry DRAM controller queues.
Item Metadata
Title |
GPU compute memory systems
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2009
|
Description |
Modern Graphic Process Units (GPUs) offer orders of magnitude more raw computing power
than contemporary CPUs by using many simpler in-order single-instruction, multiple-data (SIMD)
cores optimized for multi-thread performance rather than single-thread performance. As such,
GPUs operate much closer to the "Memory Wall", thus requiring much more careful memory
management.
This thesis proposes changes to the memory system of our detailed GPU performance simulator,
GPGPU-Sim, to allow proper simulation of general-purpose applications written using NVIDIA's
Compute Unified Device Architecture (CUDA) framework. To test these changes, fourteen CUDA
applications with varying degrees of memory intensity were collected. With these changes, we
show that our simulator predicts performance of commodity GPU hardware with 86% correlation.
Furthermore, we show that increasing chip resources to allow more threads to run concurrently does
not necessarily increase performance due to increased contention for the shared memory system.
Moreover, this thesis proposes a hybrid analytical DRAM performance model that uses memory
address traces to predict the efficiency of a DRAM system when using a conventional First-Ready
First-Come First-Serve (FR-FCFS) memory scheduling policy. To stress the proposed model, a
massively multithreaded architecture based upon contemporary high-end GPUs is simulated to
generate the memory address trace needed. The results show that the hybrid analytical model
predicts DRAM efficiency to within 11.2% absolute error when arithmetically averaged across a
memory-intensive subset of the CUDA applications introduced in the first part of this thesis.
Finally, this thesis proposes a complexity-effective solution to memory scheduling that recovers
most of the performance loss incurred by a naive in-order First-in First-out (FIFO) DRAM scheduler
compared to an aggressive out-of-order FR-FCFS scheduler. While FR-FCFS scheduling re-orders
memory requests to improve row access locality, we instead employ an interconnection network
arbitration scheme that preserves the inherently high row access locality of memory request streams
from individual "shader cores" and, in doing so, achieve DRAM efficiency and system performance
close to that of FR-FCFS with a simpler design. We evaluate our interconnection network arbitration
scheme using crossbar, ring, and mesh networks and show that, when coupled with a banked
FIFO in-order scheduler, it obtains up to 91.0% of the performance obtainable with an out-of-order
memory scheduler with eight-entry DRAM controller queues.
|
Extent |
1076848 bytes
|
Genre | |
Type | |
File Format |
application/pdf
|
Language |
eng
|
Date Available |
2009-11-27
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0068207
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2010-05
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International