Go to  Advanced Search

Please note that cIRcle is currently being upgraded to DSpace v5.1. The upgrade means that the cIRcle service will *not* be accepting new submissions from 5:00 PM on September 1, 2015 until 5:00 PM on September 4, 2015. All cIRcle material will still be accessible during this period. Apologies for any inconvenience.

Scalable and deterministic timing-driven parallel placement for FPGAs

Show full item record

Files in this item

Files Size Format Description   View
ubc_2011_fall_wang_chaochris.pdf 2.077Mb Adobe Portable Document Format   View/Open
Title: Scalable and deterministic timing-driven parallel placement for FPGAs
Author: Wang, Chao Chris
Degree Master of Applied Science - MASc
Program Electrical and Computer Engineering
Copyright Date: 2011
Publicly Available in cIRcle 2011-10-24
Abstract: This thesis describes a parallel implementation of the timing-driven VPR 5.0 simulated-annealing placement engine. By partitioning the grid into regions and allowing distant data to grow stale, it is possible to consider a large number of non-conflicting moves in parallel and achieve a deterministic result. The full timing-driven placement algorithm is parallelized, including swap evaluation, bounding-box calculation and the detailed timing-analysis updates. The partitioned region approach slightly degrades the placement quality, but this is necessary to expose greater parallelism. We also suggest a method to recover the lost quality. In simulated annealing, runtime can be shortened at the expense of quality. Using this method, the serial placer can achieve a maximum speedup of 100X while quality metrics degrades as much as 100%. In contrast, the parallel placer can scale beyond 500X with all quality metrics degrading by less than 30%. Specifically, at the point where the parallel placer begins to dominate over the serial placer, the post-routing minimum channel width, wirelength and critical-path delay degrades 13%, 10% and 7% respectively on average compared to VPR’s original algorithm,while achieving a 140X to 200X speedup 25 threads. Finally, it is shown that the amount of degradation in the parallel placer is independent of the number of threads used.
URI: http://hdl.handle.net/2429/38168
Scholarly Level: Graduate

This item appears in the following Collection(s)

Show full item record

All items in cIRcle are protected by copyright, with all rights reserved.

UBC Library
1961 East Mall
Vancouver, B.C.
Canada V6T 1Z1
Tel: 604-822-6375
Fax: 604-822-3893