QR Factorization 1: Accelerating Determinant Quantum Monte Carlo Simulations on GPU


Che-Rung Lee

13:40:00 - 14:05:00

101 , Mathematics Research Center Building (ori. New Math. Bldg.)

Recent studies on complex materials have brought new computational challenges to the Determinant Quantum Monte Carlo simulations. In this talk, we present how to redesign high level numerical algorithms to accelerate the simulation on Graphics Processing Unit (GPU). Specifically, two most time consuming numerical kernels in the simulation will be discussed. The first one is pivoted QR decomposition, used to stabilize the computation. Currently, no efficient implementation of pivoted QR decomposition is available on GPU. A new algorithm, called Block Structure Orthogonal Factorization, is proposed, which does not rely on the pivoted QR or any stratification methods, but can still achieve the same stability. The second numerical kernel is matrix-matrix multiplication, in which one of the matrices is matrix exponential. Because the original matrix is sparse, the checkerboard method, which can preserve the sparsity of original matrix in its matrix exponential, is applied. With the checkerboard method, the time complexity of matrix-matrix multiplication can be reduced to O(N2). Several performance optimization methods for the checkerboard method on GPU will be addressed