On this companion article, I’ll present my implementation for coaching from scratch a GPT-like mannequin, in Rust. No GPUs, solely CPUs, with a efficiency 30 occasions higher than the native C code.
In my final article, I launched the issue of matrix multiplication, how the eye algorithm makes use of matrix multiplication to carry out an averaging course of, and how you can effectively implement — or at the least, for me — a matrix multiplication perform in Rust with Blas.
On this new article, I need to present my first constructing block for implementing llm.c in Rust, particularly, coaching a GPT-like mannequin from scratch utilizing Rust. This has been my means of studying an increasing number of concerning the Rust ecosystem and understanding how comparable is with C. Particularly, I would like my code to have the ability to practice a GPT-like mannequin, ranging from GPT weights, utilizing solely CPUs— so no GPUs or TPUs. My intention is to know how a lot we will push these fashions on easy laptops, and the way a lot the Rust ecosystem can be utilized for this. Finally, this code can also be helpful to fine-tune GPT fashions with a given enter corpus.
All of the related items of code may be discovered right here.