Multi-GPU Workflow

Overview

  • Tutorial: 20 min

  1. Learn how to use multi-GPU programming.

  2. Understand the setup and synchronization of multiple GPUs.

cudaGetDeviceCount

cudaGetDeviceCount is used to determine how many CUDA-capable GPUs are available on your system.

1int deviceCount = 0;
2cudaGetDeviceCount(&deviceCount);

cudaSetDevice

cudaGetDeviceCount is used to determine how many CUDA-capable GPUs are available on your system.

1cudaSetDevice(0);
2cudaMalloc((void**)&d_A, dataSize);
3
4cudaSetDevice(1);
5cudaMalloc((void**)&d_B, dataSize);

In the code above d_A will be allocated on GPU-0 while d_B will be allocated on GPU-1.

cudaMemcpyDeviceToDevice

cudaMemcpyDeviceToDevice flag will allow peer-to-peer cudaMemcpy between GPUs.

1cudaMemcpy(d_C1, d_C0, dataSize, cudaMemcpyDeviceToDevice);

Key Points

  • cudaGetDeviceCount() finds how many CUDA-capable GPUs are available on the system.

  • cudaSetDevice(device_id) selects the GPU on which subsequent operations (memory allocation, kernel launches) will occur.

  • Device-to-Device Copy flag enables device to device data copy.