High-level partitioning of discrete signal transforms for distributed hardware architectures
Arce Nazario, Rafael A.
MetadataShow full item record
Discrete signal transforms (DSTs) have numerous applications in a wide spectrum of scientific fields. To attain superior performance, the size and composition of these algorithms frequently require implementation to architectures involving more than one dedicated hardware device. Even though hardware implementations of signal processing algorithms are known to be orders of magnitude faster than most other generic computing platforms, they are not commonplace mainly because of the increased complexity involved in partitioning and mapping such algorithms onto distributed hardware platforms. Automated methods and tools to aid in the design and exploration of distributed implementations shall encourage adoption of dedicated hardware platforms for high-performance applications. Traditionally, partitioning to distributed hardware architectures (DHAs) has been done either manually, or at various stages of the design process, predominantly at the behavioral and structural levels. Although these schemes have produced acceptable implementations, they do not necessarily exploit the functional properties of algorithms. Structural level techniques handle the design at an abstraction level too low for the algorithm functionality to be effectively interpreted. Most proposed higher-level strategies target generic partitioning problems through local optimization techniques, which miss out on alternate formulations that become apparent only on a higher level of abstraction. This dissertation presents an automated methodology specifically designed for partitioning DSTs onto DHAs. The methodology takes advantage of DST features at two levels of abstraction: the graph and algorithmic levels. At the algorithmic level, an exploration is conducted in search of equivalent transform formulations that are more suitable for the target topology. At the graph level, a series of DSTspecific structural considerations are made to improve the partitioning heuristic. The developed strategy integrates several algorithms which allow exploring partitioning solutions at both abstraction levels. A Kronecker products algebra (KPA) to dataflow graph conversion (DFG) tool was developed to allow straightforward conversion and structural visualization of KPA expressed formulations. The DFG generated by this tool is partitioned using a k-way deterministic algorithm with structural considerations derived from common DST features. A scheduler estimates latency of the partition solution, constrained by the available computational resources, as determined by an area estimator. Solution cost at the graph level is used to transform the current DST formulation, and thus explore alternative formulations as part of the partition optimization process. Given the exponential size of the space of equivalent formulations, a greedy, polynomialtime exploration heuristic was designed. Results from the application of this methodology to a range of sizes of Discrete Fourier Transforms and Discrete Cosine Transforms evidence the advantages of making the partition methodology DST-aware. Latency and run-time reductions of up to 34% and 99%, respectively were obtained with respect to previously proposed, stochastic high-level partitioning approaches.