CUDA Events

Overview

  • Time: 30 min

  1. Learn about CUDA events and their usage.

  2. Understand how to measure elapsed time using CUDA events.

CUDA events are a powerful feature that allows you to measure the time taken by operations on the GPU. They can be used to synchronize between different streams and to profile the performance of your CUDA applications. Events are lightweight and provide a way to track the completion of operations without blocking the CPU.

CUDA Event Basics

CUDA Events

CUDA events are used to mark points in time in your CUDA application. They can be created, recorded, and queried to determine when certain operations have completed. Events can be used to measure the elapsed time between two points in your code, which is useful for performance profiling. CUDA events are created using the cudaEventCreate function, and you can record an event using cudaEventRecord.

 1cudaEvent_t start, stop;
 2cudaEventCreate(&start);
 3cudaEventCreate(&stop);
 4
 5// Record the start event
 6cudaEventRecord(start, 0);
 7
 8// Perform some GPU operations here
 9
10// Record the stop event
11cudaEventRecord(stop, 0);
12
13// Wait for the stop event to complete
14cudaEventSynchronize(stop);
15
16float milliseconds = 0;
17cudaEventElapsedTime(&milliseconds, start, stop);
18printf("Elapsed time: %f ms\n", milliseconds);

Cross stream Synchronization

CUDA events can also be used for cross-stream synchronization. When you record an event in one stream, it can be waited on in another stream. This allows you to synchronize operations across different streams without blocking the CPU.

 1cudaStream_t stream1, stream2;
 2cudaStreamCreate(&stream1);
 3cudaStreamCreate(&stream2);
 4
 5cudaEvent_t event;
 6cudaEventCreate(&event);
 7
 8// Record an event in stream1
 9cudaEventRecord(event, stream1);
10
11// Perform some operations in stream1
12kernel1<<<blocks, threads, 0, stream1>>>(...);
13
14// Wait for the event in stream2
15cudaStreamWaitEvent(stream2, event, 0);
16
17// Perform some operations in stream2 that depend on the completion of stream1
18kernel2<<<blocks, threads, 0, stream2>>>(...);
19
20// Cleanup
21cudaEventDestroy(event);
22cudaStreamDestroy(stream1);
23cudaStreamDestroy(stream2);

Explanation

cudaStreamWaitEvent is used to make a stream wait for an event recorded in another stream.

Timiming events across streams

When timing events across streams, you can record events in one stream and then wait for those events in another stream. This allows you to measure the time taken by operations in different streams without blocking the CPU.

 1cudaStream_t stream1, stream2;
 2cudaStreamCreate(&stream1);
 3cudaStreamCreate(&stream2);
 4
 5cudaEvent_t start, stop;
 6cudaEventCreate(&start);
 7cudaEventCreate(&stop);
 8
 9// Record the start event in stream1
10cudaEventRecord(start, stream1);
11
12// Perform some operations in stream1
13kernel1<<<blocks, threads, 0, stream1>>>(...);
14
15// Record the stop event in stream2
16cudaEventRecord(stop, stream2);
17
18// Wait for the stop event to complete
19cudaEventSynchronize(start);
20cudaEventSynchronize(stop);
21
22float milliseconds = 0;
23cudaEventElapsedTime(&milliseconds, start, stop);
24printf("Elapsed time across streams: %f ms\n", milliseconds);

Important

  1. You cannot use an event recorded in one stream to measure the execution time of operations that occur in another stream unless you enforce proper synchronization.

  2. If streams are non-blocking and concurrent, incorrect usage may lead to race conditions or invalid timings.

Key Points

  1. CUDA events are used to measure elapsed time and synchronize operations across streams.

  2. Events can be created, recorded, and queried to determine the completion of operations.

  3. Cross-stream synchronization is achieved using cudaStreamWaitEvent.

  4. Timing events across streams requires careful synchronization to ensure accurate measurements.

  5. Events are lightweight and do not block the CPU, making them suitable for performance profiling.