News

Minghua Shen Successfully Defended his Ph.D. Thesis

2017-06-06

 Minghua Shen successfully defended his Ph.D. Thesis at CECA on June 5, 2017. Congratulations!
For more information (in Chinese), please refer to: 
http://ceca.pku.edu.cn/news.php?action=detail&article_id=689

 

Abstract: With the slowdown of Moore's Law, the computing landscape is becoming increasingly parallel and heterogeneous, consisting of a larger number of cores and customized accelerators. FPGAs shows particularly promising as an acceleration technology with its reconfigurability and customizability, owing to that they can provide performance and energy improvements in a broad range of applications. For example, Microsoft's large-scale FPGA-based cluster has been used thus far to accelerate Bing web search engine and deep neural network processing. Compared with other competitive accelerators like GPUs, FPGAs usually offer much better energy efficiency and can still deliver high performance for datacenter computing infrastructures. However, the increasingly lengthy compilation time associated with FPGA computer-aided design (CAD) algorithms has been a severe limitation to broader adoption of this technology. Routing is undoubtedly the most tedious and time-consuming process in the FPGA design flow. As multi-core and many-core processors become more prevalent today, parallelization has the potential to accelerate the routing process.

 

Contributions and innovations of this thesis are summarized as follows:
Coarse-grained distributed parallel routing.
In this study, we design a coarse-grained parallel recursive partitioning method to implement FPGA routing acceleration. And then we explore to establish how much quality degradation is necessary to achieve a given speedup. In parallelization, we attempt to partition the nets into three subsets, where the first subset and the other two subsets consist of potentially conflicting nets and potentially conflicting-free nets, respectively. The two potentially conflicting-free subsets are routed in parallel after the first subset is routed. And all subsets are recursively partitioned in the same way. Furthermore, we point out that the estimated runtime using recursive bisection is close to the optimal estimated runtime using the optimal recursive partitioning, which we can find in polynomial time. The parallel router is implemented using the Message Passing Interface (MPI). Experimental results show that our parallel router ParRoute+ achieves a 7.06x speedup compared to the VPR 7.0 router. This is a 3.36x improvement over a recent coarse-grained parallel router.

 

Fine-grained GPU-accelerated parallel routing.
FPGAs are increasingly popular as application-specific accelerators because they lead to a good balance between flexibility and energy efficiency, compared to CPUs and ASICs. However, the long routing time imposes a barrier on FPGA computing, which significantly hinders the design productivity. Existing attempts of parallelizing the FPGA routing either do not fully exploit the parallelism or suffer from an excessive quality loss. Massive parallelism using GPUs has the potential to solve this issue but faces non-trivial challenges. To cope with these challenges, this work presents a fine-grained GPU-accelerated FPGA routing method. The method enables applying the GPU-friendly shortest path algorithm in FPGA routing, leveraging the idea of problem size reduction by limiting the search in routing subgraphs. We maintain the convergence after problem size reduction using the dynamic expansion of the routing resource subgraphs. In addition, we explore the fine-grained single-net parallelism and proposes a hybrid approach to combine the topology-driven and data-driven parallelism on GPU. To explore the coarse-grained multi-net parallelism, we propose an effective method to parallelize mutli-net routing while preserving the equivalent routing results as the original single-net routing. Experimental results show that our method achieves an average of 18.72x speedup on GPU with a tolerable loss in the routing quality and sustains a scalable speedup on large-scale routing graphs. To our knowledge, this is the first work to demonstrate the effectiveness of GPU-accelerated FPGA routing.

 

Sequential-equivalent parallel routing framework.
The qualitative changes of FPGA have been driven by quantitative effects of Moore's Law. The performance of CAD tools increases will be provided primarily through parallel techniques. Sequential equivalence has become more important for parallel CAD algorithms, evidenced by its easier regression verification and customer support in industry. In this paper we introduce a universal sequential-equivalent parallel routing framework for FPGAs, namely Crown. Crown explores the routing parallelism on different hardware platforms and resorts to an optimal dependency-aware scheduling algorithm to maintain the sequential equivalence of parallel routing algorithms. Specifically, Crown enables the coarse-grained multiple nets parallelization to strive for significant speedup. In net-level parallelization, Crown only parallelizes the multiple independent nets under the constraint of sequential equivalence. While in node-level parallelization, Crown leverages dynamic parallelism to accelerate the single net routing. Experimental results show that Crown effectively maintains sequential equivalence of parallel routing algorithms that always gives the same answer as the serial version of the algorithm. Moreover, Crown provides an average of 19.68$times$ speedup on GPU with a tolerable loss in the routing quality. To our knowledge, it is the first parallel routing framework with a sequential equivalence guarantee.

 

Accelerating the routing process is a challenging problem. We found that routing parallelization is an effective method. In this thesis, we present a serial of parallel techniques for FPGA routing. They provide valuable solutions and important research directions, which have been verified by experiments in real designs. Research outputs of this thesis have been published in several well-known international conferences in this field. They are expected to advance the development of data center design.