QR Factorization 2: Auto-tuning for CPU-GPU QR Factorization with Statistical Offline Models and Online Monitor


Weichung Wang

14:05:00 - 14:30:00

101 , Mathematics Research Center Building (ori. New Math. Bldg.)

QR factorization is a computational kernel of scientific computing. How can the latest computer be used to accelerate this task? We investigate this topic by proposing a QR factorization algorithm with adaptive block sizes on a hybrid system that contains a CPU and a GPU. To maximize the use of CPU and GPU, we develop an auto-tuning scheme that chooses block size at each iteration. The auto-tuning decision is based on statistical surrogate models of performance and an online monitor, which avoids unexpected occasional performance drops. Numerical results suggest that our approaches are efficient and can lead to near-optimal block sizes. The proposed algorithm can be extended to other one-sided factorizations, such as LU and Cholesky factorizations.