Profile using Nsight Systems

Overview

  • Time: 30 min

  • Learn how to use nsys for performance analysis.

  • Learn how to generate detailed reports for optimization.

NVIDIA Nsight Systems (nsys)

nsys is NVIDIA’s system-wide performance analysis tool used to profile CUDA applications and understand how your GPU and CPU interact. It provides timeline-based visualization and runtime statistics for:

  • GPU kernel launches and memory transfers

  • CPU threads and system calls

  • CUDA runtime/API calls

To profile the code first compile with line info

1nvcc -O2 -lineinfo -o 17_vector_add_unified 17_vector_add_unified.cu

then profile the executable

1nsys profile \
2    --stats=true \
3    --trace=cuda,nvtx,osrt \
4    --cuda-memory-usage=true \
5    -o 17_vector_add_unified_profile \
6    ./17_vector_add_unified
Explanation of nsys profile flags

Option

Description

--stats=true

Prints a summary of profiling statistics (e.g., kernel execution time, memory transfers) in the terminal after profiling.

--trace=cuda,nvtx,osrt

Enables tracing for selected domains: - cuda: CUDA API calls and kernel activity - nvtx: User-defined NVTX ranges/markers - osrt: OS runtime info (e.g., CPU threads, scheduling)

--cuda-memory-usage=true

Reports CUDA memory usage, including allocation and deallocation across the application.

-o ${exe}_profile

Sets the base name for output files. Generates: - ${exe}_profile.qdrep: Profiling report (open in nsys-ui) - ${exe}_profile.sqlite: Structured performance data

./${exe}

Runs the compiled executable being profiled.

Key Points

  • NVIDIA Nsight Systems (nsys) is a tool for profiling CUDA applications, providing insights into GPU and CPU interactions.

  • Compile your CUDA code with -lineinfo for detailed profiling information.

  • Use nsys profile with appropriate flags to collect execution statistics, trace CUDA and system activity, and report memory usage.

  • Profiling results include summary statistics and detailed reports for performance analysis and optimization.