18 May, 2025
Introduction
This week the CUTLASS team released a new version of CUTLASS which introduces CuTeDSL a python interface which gives the user access to * core concepts such as layouts, tensors, hardware atoms, and full control over the hardware thread and data hierarchy* as can be read in their dedicated documentation.
In this blogpost we aim to give a basic understanding of some principles of CuTe layout algebra. We will explain some basic concepts like the layout function, coalescing and complementation. This can be helpful for a deeper understanding of the CuTeDSL.
Layout
We define a Layout as
L=(M0,M1,...,Mn):(d0,d1,...,dn)here
S=(M0,M1,...,Mn) is the shape where M=M0·...·Mn is the size of the layout.
D=(d0,d1,...,dn) is the stride.
The pairs
(Mi):(di) are called modes and can be considered layouts of length 1. In general we define the length to be the number n+1 above.
Layout function
Let
ι(x)=(xmodM0,⌊xM0⌋modM1,...,⌊xM0·...·Mn−1⌋modMn)=(ι0,ι1,...,ιn)which is an isomorphism [0,M)≅[0,M0)×...×[0,Mn). It maps a 1d index to a multidimensional index.
The layout function is than given as the mapping which applies the isomorphism to a number x (if needed), multiplies each vector entry with the corresponding stride entry and sums these up, i.e.
fL(x)=ι0d0+ι1d1+...+ιndn
Let's consider as an example the layout (2,4):(2,2).
Let's calculate fL(3).
ι(x)=(3mod2,⌊32⌋mod4)=(1,1)from here we get
fL(3)=1·2+1·2=2+2=4We can calculate this in CuTeDSL:
This will print:
Sorted layouts
We define a sorted layout such that all the strides are increasing, i.e. di−1≤di.
Sorting doesn't leave a layout invariant. Cosider the following representations:
L1=(2,2):(3,1)L2=sorted(L1)=(2,2):(1,3)the corresponding layout functions don't coincide, which we can easily verify with CuTeDSL:
which will print:
which we could have of course also verified by hand for this simple example. We may also understand that by looking at the above example and seeing that L1 is row major while L2 is column major.
Coalescing
Coalescing is an operation that
- Preserves the size of the layout
- Preserves the associated layout function.
Let's take a simple example:
L=(2,1):(3,1)By looking at the above definition of ι we can recognize that ι2 will always be 0, i.e. it won't contribute to the layout function for any value x, i.e.
coalesce(L)=2:3We can verify this:
which will print
Complementation
Admissability
Let L be a layout and K be a positive integer. (L,K) is admissable for complementation if the following conditions are satisfied:
- Mi−1di−1 divides di
- Mndn divides K
Complement
If (L,K) is admissable we define the complement operation as follows:
complement(L,K)=(d0,d1M0d0,...,dnMn−1dn−1,KMndn):(1,M0d0,...,Mndn)Let's take simple example:
L=(2,4):(1,2), K=16.
(L,K) is admissable.
We can calculate the complement:
complement(L,K)=(1,1,2):(1,2,8)using same argument as above we can coalesce this to
A=2:8We can calculate this as follows:
which will give the same result as above:
We can interpret the complement as follows:
Let A be the concatenation of the layout and it's complement. The concatenation can be simply formed by combining all the modes into one layout. For the above example that is:
A=(2,4,2):(1,2,8)we can proove that this gives us a bijection fA:[0,M)→[0,M). Please see proposition 2.7 of this blogpost.
We can verify this using CuTeDSL:
Note that the bijection not necessarily is the identity.
For example take
B=8:2 and K=32.
(B,K) is admissable because 8·2=16 divides 32.
complement(B,K)=(2,2):(1,16)The concatenated layout is
A=(8,2,2):(2,1,16)and printing out the values of the layout function will give us a bijection unequal to the identity:
Conclusion
I hope this blogpost can serve as an easy intro to the CuTeDSL by connecting mathematical concepts with programming. For a deeper mathematical explanation of the concepts see Lei Maos blogposts and Jay Shahs note on CuTe layout algebra.
For further examples of CuTeDSL see the CUTLASS repo.