Memory Pool

Overview

  • Tutorial: 30 min

  1. Understand the concept of memory pools in CUDA.

  2. Learn how to create and manage memory pools using the CUDA API.

  3. Explore the benefits of using memory pools for performance optimization.

In CUDA, memory pools are a way to efficiently manage device memory allocations by reducing the overhead of frequent cudaMalloc and cudaFree calls. The memory pool gives you more control over how memory is allocated and reused on the GPU.

Traditional cudaMalloc and cudaFree are expensive operations. In workloads with frequent allocations and deallocations this overhead can become a performance bottleneck. A memory pool is a pre-allocated chunk of memory from which smaller allocations are served. Instead of going to the OS or CUDA driver each time, memory requests are fulfilled from this pool. It allows for

  • Faster memory allocation/deallocation

  • Better memory reuse

  • Control over memory fragmentation

The mian differnce in implementation is that * cudaMallocAsync() replaces cudaMalloc() * cudaFreeAsync() replaces cudaFree()

Default Memory Pool

The default memory pool in CUDA does not have a fixed size. Instead, it grows and shrinks dynamically based on allocation needs, up to the limits of the available device memory.

  • The default pool starts empty.

  • When you call cudaMallocAsync(), the pool requests memory from the system as needed.

  • It can reuse memory from previous allocations if available.

  • The pool will continue to grow until:
    • There is no more available device memory, or

    • A soft limit (like the release threshold) is reached and enforced by your settings.

When memory is freed using cudaFreeAsync(), it is returned to the pool for reuse, rather than being returned to the system immediately. This allows for faster subsequent allocations from the memory pool.

 1cudaStream_t stream;
 2cudaStreamCreate(&stream);
 3
 4float* d_ptr;
 5size_t size = 1024 * sizeof(float);
 6
 7// Asynchronous allocation
 8cudaMallocAsync((void**)&d_ptr, size, stream);
 9
10// Use d_ptr in a kernel...
11
12// Asynchronous deallocation
13cudaFreeAsync(d_ptr, stream);
14
15cudaStreamDestroy(stream);

We can get the attributes of the default memory pool using the following code:

1cudaMemPoolGetAttribute(pool, cudaMemPoolAttrReservedMemCurrent, &current);
2cudaMemPoolGetAttribute(pool, cudaMemPoolAttrReservedMemHigh, &high);

where cudaMemPoolAttrReservedMemCurrent gives the current size of the reserved memory in the pool, and cudaMemPoolAttrReservedMemHigh gives the maximum size of the reserved memory at any point in the pool.

Custom Memory Pool

Custom memory pools allow you to create and manage your own memory allocator instead of relying on the default memory pool. This gives you more control over how and when memory is reserved and reused, which is useful in performance-critical or memory-constrained applications.

Custom memory pools allows to:

  • Set allocation limits

  • Track memory usage independently

  • Control release thresholds

  • Isolate subsystems or tasks using separate allocators

Explanation

Control release thresholds in CUDA memory pools refer to settings that determine when the pool should release unused memory back to the system (device allocator).

Create a memory pool

1cudaMemPool_t myPool;
2cudaMemPoolProps props = {}; //struct that specifies the properties of the memory pool
3props.allocType = cudaMemAllocationTypePinned;
4props.handleTypes = cudaMemHandleTypeNone;
5props.location.type = cudaMemLocationTypeDevice;
6props.location.id = 0; // device ID
7
8cudaMemPoolCreate(&myPool, &props);

The cudaMemPoolProps structure defines the properties for a custom CUDA memory pool. Below is a detailed explanation of each field:

Field

Description

allocType

Specifies the type of memory to allocate. Options include:

  • cudaMemAllocationTypeDevice: Device memory (GPU global memory).

  • cudaMemAllocationTypePinned: Pinned host memory, page-locked.

  • cudaMemAllocationTypeManaged: Unified memory accessible by both host and device.

handleTypes

Specifies how memory handles can be shared across processes. Options include:

  • cudaMemHandleTypeNone: No inter-process sharing.

  • cudaMemHandleTypePosixFd: Shareable via POSIX file descriptors (Linux).

  • cudaMemHandleTypeWin32: Shareable via Windows handles.

location.type

Indicates the location type of the memory. Commonly set to:

  • cudaMemLocationTypeDevice: Memory pool is tied to a specific GPU.

  • cudaMemLocationTypeHost: Host-based memory pool (rare).

location.id

Specifies the device or host ID. For device memory pools, this is the GPU ID (e.g., 0 for cudaSetDevice(0)).

Set attributes (optional)

Attribute

1cudaMemPoolSetAttribute(myPool, cudaMemPoolAttrReleaseThreshold, 1024 * 1024); // 1 MB threshold
2cudaMemPoolSetAttribute(myPool, cudaMemPoolAttrReservedMemCurrent, 512 * 1024 * 1024); // 512 MB reserved
3cudaMemPoolSetAttribute(myPool, cudaMemPoolAttrReservedMemHigh, 1024 * 1024 * 1024); // 1 GB high limit

The following attributes are configured for a custom memory pool using cudaMemPoolSetAttribute. Each attribute influences the behavior of memory allocation, reuse, and release.

Attribute

Description

cudaMemPoolAttrReleaseThreshold = 1024 * 1024

Sets the maximum number of unused bytes (1 MB) the memory pool can retain before it begins releasing memory back to the system.

cudaMemPoolAttrReservedMemCurrent = 512 * 1024 * 1024

(Optional/Advanced) Suggests setting the current reserved memory to 512 MB. Not always user-configurable—used more for querying.

cudaMemPoolAttrReservedMemHigh = 1024 * 1024 * 1024

Sets a soft cap (1 GB) for the high watermark of memory usage within the pool, useful for monitoring purposes.

Explanation

  • The high watermark is the highest amount of memory the pool has ever allocated or reserved at any point in time.

  • This attribute records or sets a soft limit of 1 GB as that peak usage.

  • It doesn’t enforce a strict limit but serves as a reference point to monitor or track how much memory the pool is using at its peak.

  • This can help developers understand memory usage patterns and detect if memory consumption approaches or exceeds expected values.

Use the memory pool

cudaMallocFromPoolAsync() allocates memory from the specified memory pool instead of the default device memory allocator.

1cudaStream_t stream;
2cudaStreamCreate(&stream);
3
4void* ptr;
5cudaMallocFromPoolAsync(&ptr, size, myPool, stream);

Pool trimming

Pool trimming in CUDA memory management refers to the process where the memory pool releases unused memory back to the operating system or underlying system allocator.

  • Over time, the pool can accumulate unused memory chunks that are no longer needed by the application.

  • Pool trimming is the act of freeing these unused memory chunks, reducing the memory footprint of the pool.

  • This helps in controlling memory usage and preventing the application from holding excessive unused memory.

1cudaMemPoolTrimTo(myPool, releaseThreshold); // Trim the pool to release memory below the threshold of 1 MB

Explanation

The releaseThreshold is amount of unused memory (in bytes) you want the pool to release back to the system. This controls how aggressively the pool trims unused allocations

Feature

Default Memory Pool

Custom Memory Pool

Global shared pool

Yes

No (per-instance)

Automatically initialized

Yes

No

Can be configured

Partially (release only)

Fully (limits, thresholds)

Used by cudaMallocAsync

Yes

No (must use cudaMallocFromPoolAsync)

Lifetime

Tied to context/device

You manage it

Key Points

  • Memory pools in CUDA allow for efficient memory management by reducing allocation overhead.

  • The default memory pool grows dynamically and reuses memory for faster allocations.

  • Custom memory pools provide more control over allocation limits, reuse policies, and release thresholds.

  • Use cudaMallocFromPoolAsync() to allocate from a custom memory pool.

  • Pool trimming helps manage memory usage by releasing unused chunks back to the system.