Multi-GPU Workflow

Overview

Tutorial: 20 min

Learn how to use multi-GPU programming.

Understand the setup and synchronization of multiple GPUs.

cudaGetDeviceCount

cudaGetDeviceCount is used to determine how many CUDA-capable GPUs are available on your system.

int deviceCount = 0;
cudaGetDeviceCount(&deviceCount);

cudaSetDevice

cudaGetDeviceCount is used to determine how many CUDA-capable GPUs are available on your system.

cudaSetDevice(0);
cudaMalloc((void**)&d_A, dataSize);

cudaSetDevice(1);
cudaMalloc((void**)&d_B, dataSize);

In the code above d_A will be allocated on GPU-0 while d_B will be allocated on GPU-1.

cudaMemcpyDeviceToDevice

cudaMemcpyDeviceToDevice flag will allow peer-to-peer cudaMemcpy between GPUs.

cudaMemcpy(d_C1, d_C0, dataSize, cudaMemcpyDeviceToDevice);

Key Points

cudaGetDeviceCount() finds how many CUDA-capable GPUs are available on the system.

cudaSetDevice(device_id) selects the GPU on which subsequent operations (memory allocation, kernel launches) will occur.

Device-to-Device Copy flag enables device to device data copy.