QR Factorization 3: Performance Evaluation and Tuning of Tall Skinny Type QR Factorization on the K Computer


Takeshi Fukaya

14:30:00 - 14:55:00

101 , Mathematics Research Center Building (ori. New Math. Bldg.)

QR factorizations of tall and skinny matrices appear in several matrix computations, thus efficient algorithms for current and future parallel machines are required. The Householder QR algorithm, which is one of the conventional algorithms, can be straightforwardly parallelized in the BLAS level, but this parallelism is so fine-grained that communication cost becomes a performance bottleneck in massively parallel computing. On the other hand, a new algorithm whose parallelism is more coarse-grained than that of the Householder QR algorithm has been proposed by Demmel et al. and it has attracted the interest of many researchers. Under these situations, we evaluated the performance of the TSQR algorithm and that of the Householder QR algorithm on the K computer. Through our experiments, we verified that the TSQR algorithm is useful in some cases, however, we found that it is not in other cases. In this talk, we will report the results of our experiments. In addition, we will investigate the obtained performance using performance models. Then, we will explain our approach including the auto tuning techniques to improve the performance of this computation.