Some CUDA code examples with READMEs

4 months ago 7

Welcome to CUDA Examples! This project intends to serve as a practical landing page for CUDA C++ development. If you see anything present or missing that might make it more useful, open an issue.

The repository is split into a few main structures by example type:

  • SetupAndInitExamples - stuff that happens at the beginning of a CUDA program (this is not installation).
  • MemoryAndStructureExmaples - examples related to ways to allocate memory, launch kernels, or structure code to be beneificial for use with CUDA. These can and will leverage code / kernel samples, but their theme will be more around "good ways to do things" or "considerations" when writing CUDA programs.
  • KernelAndLibExamples - kernels, core libraries, thrust, etc. Just general examples of how to actually load and process data on the GPU.
  • ProfilingExamples - examples to profile or benchmark CUDA code.
  • PerformanceChecklist - examples which cover the standard CUDA performance checklist.
  • TensorParallelFromScratch - follows my blog series on implementing tensor parallelism in CUDA from scratch!

I love contributions because I get to learn from you and the project grows to help more people. With that in mind, contributions should meet the following characteristics:

  • Novel - no existing examples like them
  • Documented - there should be a clear explanation of what you are conveying with the add
  • Correct - Self-explanatory

I imagine that CUDA kernel samples, thrust samples, and other core library examples will fill up the most quickly under KernelAndLibExamples, which means that one will eventually be the hardest to contribute to. When forming a contribution, PLEASE ensure that you are showing something novel. I do not want to reject any PRs because they are redundant - you spent time on that! But alas, I will if they don't show new concepts.

As sort of a sub-section of contributions, I'd like to cover conventions to use:

  • Each example should have its own subdirectory under the appropriate section
  • If your example produces output, it should run as a bash script and the output files should be added to .gitignore
  • All executables should be called main (this is also for .gitignore)
  • All examples should work with C++20.
  • Each example should include utils/utils.cuh, and each CUDA call that returns a cudaError_t should be wrapped with a cudaCheckError like cudaCheckError(::cudaDeviceSynchronize());.
  • Each CUDA API call should be prefixed with :: to inform the reader that it is an external API call. See example above.

The CUDA_ARCHITECTURES defaults to 86. To run with the default:

mkdir build && cd build cmake .. make -j$(nproc)

If you require a single different architecture (and run with 8 processors):

mkdir build && cd build cmake -DCUDA_ARCHITECTURES="80" .. make -j8

If you'd like to pass multiple architectures:

mkdir build && cd build cmake -DCUDA_ARCHITECTURES="80;86;90" .. make -j8

To build a particular example:

cd <ExampleDir>/<example> make -j$(nproc)

The make files are less robust and may need modified. Feel free to submit a PR to fix them!

Read Entire Article