Studying Triton One Kernel at a Time: Matrix Multiplication

multiplication is undoubtedly the most typical operation carried out by GPUs. It’s the elementary constructing block of linear algebra and reveals up throughout a large spectrum of various fields equivalent to graphics, physics simulations and scientific computing whereas being ubiquitous in machine studying.

In in the present day’s article, we’ll break down the conceptual implementation of normal matrix-matrix multiplication (GEMM) whereas introducing a number of optimisation ideas equivalent to tiling and reminiscence coalescing. Lastly, we’ll implement GEMM in Triton!

This text is the second of a sequence on Triton and GPU kernels, If you’re not accustomed to Triton or want a refresher on GPU fundamentals, try the earlier article! All of the code showcased on this article is on the market on GitHub.