Multi-GPU Workflow
Overview
Tutorial: 20 min
Learn how to use multi-GPU programming.
Understand the setup and synchronization of multiple GPUs.
cudaGetDeviceCount
cudaGetDeviceCount is used to determine how many CUDA-capable GPUs are available on your system.
1int deviceCount = 0;
2cudaGetDeviceCount(&deviceCount);
cudaSetDevice
cudaGetDeviceCount is used to determine how many CUDA-capable GPUs are available on your system.
1cudaSetDevice(0);
2cudaMalloc((void**)&d_A, dataSize);
3
4cudaSetDevice(1);
5cudaMalloc((void**)&d_B, dataSize);
In the code above d_A will be allocated on GPU-0 while d_B will be allocated on GPU-1.
cudaMemcpyDeviceToDevice
cudaMemcpyDeviceToDevice flag will allow peer-to-peer cudaMemcpy between GPUs.
1cudaMemcpy(d_C1, d_C0, dataSize, cudaMemcpyDeviceToDevice);
Key Points
cudaGetDeviceCount()finds how many CUDA-capable GPUs are available on the system.
cudaSetDevice(device_id)selects the GPU on which subsequent operations (memory allocation, kernel launches) will occur.
Device-to-Device Copyflag enables device to device data copy.