Introduction to CUDA Programming
This repository provides an introduction to CUDA programming using C. It covers the fundamentals of parallel programming with NVIDIA’s CUDA platform, including concepts such as GPU architecture, memory management, kernel functions, and performance optimization. The materials are designed for beginners and include step-by-step tutorials, practical examples, and exercises to help you get started with writing and running CUDA programs in C.
Note
This project is under active development.
Contents
- Prerequisite
- Learning Outcomes
- Tutorial
- Graphics Processing Unit (GPU)
- Exercise 1
- GPU Execution Model
- GPU Workflow
- Exercise 2
- Understanding Warps in CUDA
- Asynchronous CUDA Calls
- Exercise 3
- Shared Memory in CUDA
- CUDA Events
- Exercise 4
- Unified Memory
- Exercise 5
- Dynamic Parallelism in CUDA
- Exercise 6
- Finding Optimal GPU Occupancy
- Exercise 7
- CUDA Graphs
- Exercise 8
- Memory Pool
- Exercise 9
- cuBLAS
- Exercise 10
- Multi-GPU Workflow
- Exercise 11
- Profile using Nsight Systems
- Exercise 12
- Debug using cuda-gdb
- Exercise 13
- Reference
- Contributers