Profile using Nsight Systems
Overview
Time: 30 min
Learn how to use nsys for performance analysis.
Learn how to generate detailed reports for optimization.
NVIDIA Nsight Systems (nsys)
nsys is NVIDIA’s system-wide performance analysis tool used to profile CUDA applications and
understand how your GPU and CPU interact. It provides timeline-based visualization and runtime
statistics for:
GPU kernel launches and memory transfers
CPU threads and system calls
CUDA runtime/API calls
To profile the code first compile with line info
1nvcc -O2 -lineinfo -o 17_vector_add_unified 17_vector_add_unified.cu
then profile the executable
1nsys profile \
2 --stats=true \
3 --trace=cuda,nvtx,osrt \
4 --cuda-memory-usage=true \
5 -o 17_vector_add_unified_profile \
6 ./17_vector_add_unified
Option |
Description |
|---|---|
|
Prints a summary of profiling statistics (e.g., kernel execution time, memory transfers) in the terminal after profiling. |
|
Enables tracing for selected domains:
- |
|
Reports CUDA memory usage, including allocation and deallocation across the application. |
|
Sets the base name for output files. Generates: - ${exe}_profile.qdrep: Profiling report (open in nsys-ui) - ${exe}_profile.sqlite: Structured performance data |
|
Runs the compiled executable being profiled. |
Key Points
NVIDIA Nsight Systems (nsys) is a tool for profiling CUDA applications, providing insights into GPU and CPU interactions.
Compile your CUDA code with -lineinfo for detailed profiling information.
Use nsys profile with appropriate flags to collect execution statistics, trace CUDA and system activity, and report memory usage.
Profiling results include summary statistics and detailed reports for performance analysis and optimization.