Workshops

Fast Methods 1: Automatic Tuning for Parallel FFTs

91
reads

Daisuke Takahashi

2013-03-28
09:30:00 - 09:55:00

101 , Mathematics Research Center Building (ori. New Math. Bldg.)



In this talk, an automatic performance tuning for parallel fast Fourier transforms (FFTs) is presented. An blocking algorithm for parallel FFTs utilizes cache memory effectively. Since the optimal depth of recursion may depend on the problem size, a method to determine the optimal block size that minimizes the number of cache misses is proposed. In addition, an automatic tuning of all-to-all communication is also implemented. Performance results of parallel FFTs with automatic performance tuning on clusters of multi-core processors are reported.