BLAS 2: BLAS3-level Matrix Multiply Automatic Tuning Implemented within Strassen


Teruo Tanaka

14:05:00 - 14:30:00

101 , Mathematics Research Center Building (ori. New Math. Bldg.)

Strassen recursive algorithm is efficient for reduction of computational complexity of matrix multiply, by which computational complexity of O(n3) can be reduced to O(nlog7). However, although Strassen algorithm reduces number of matrix multiplications, it increases matrix additions. Therefore, Strassen algorithm is not beneficial in practice for small matrices. The efficiency of Strassen algorithm depends on the specific implementation and hardware (computational environment). It is important to adjust the number of recursion of the algorithm according to the matrix size and computational environment. We implemented matrix multiply auto-tuning at installation phase, whose performance parameter is number of recursion of Strassen algorithm. We used ATLAS for basic matrix multiply. The result of our evaluation shows that a performance parameter was selected by this auto-tuning, which achieves over 20% speedup against ATLAS original routines.