Skip to content

Commit 58d667b

Browse files
authored
[LRT][clr] Update changelog for 7.12 (#4020)
## Motivation Adding more changelog for 7.12 in LRT HIP. ## Technical Details Summarized all features and bug fixes in 7.12 branch for LRT. ## JIRA ID https://amd-hub.atlassian.net/browse/AIRUNTIME-52 ## Test Plan No need to test for document. ## Test Result N/A ## Submission Checklist Cover all features and bug fixes for 7.12 LRT.
1 parent 8f03623 commit 58d667b

File tree

1 file changed

+23
-3
lines changed

1 file changed

+23
-3
lines changed

projects/clr/CHANGELOG.md

Lines changed: 23 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,24 +7,44 @@ Full documentation for HIP is available at [rocm.docs.amd.com](https://rocm.docs
77
### Added
88

99
* New HIP APIs
10+
- Library Management
11+
Support for the following APIs for parity with the corresponding CUDA APIs.
12+
* `hipKernelSetAttribute` sets an attribute for a kernel
13+
* `hipKernelGetAttribute` returns information about a kernel
14+
* `hipKernelGetFunction` returns a function handle
15+
- Memory Management
16+
* Added support for `hipMipmappedArrayGetMemoryRequirements`, which returns memory requirements for HIP mipmapped arrays and ensures parity with CUDA APIs.
1017
- Cooperative Groups
11-
* Support for `barrier` APIs `barrier_arrive` and `barrier_wait` has been added for both `grid_group` and `thread_block` to enable finer‑grained synchronization within cooperative groups
18+
* Support for `barrier` APIs `barrier_arrive` and `barrier_wait` has been added for both `grid_group` and `thread_block` to enable finer‑grained synchronization within cooperative groups.
1219
* Support for `block_rank` in the class `grid_group`, returns the rank of the block in the calling thread
1320
- Dynamic logging, no matching CUDA APIs exist
1421
* `hipExtEnableLogging` enables HIP runtime logging
1522
* `hipExtDisableLogging` disables HIP runtime logging
1623
* `hipExtSetLoggingParams` sets HIP runtime logging parameters
1724

18-
* New HIP enumeration
19-
- `hipDeviceAttributeExpertSchedMode` has been added to hipDeviceAttribute_t to indicate whether expert scheduling mode is supported on AMD GPUs
25+
* New HIP device attributes
26+
- `hipDeviceAttributeExpertSchedMode` has been added to hipDeviceAttribute_t to indicate whether expert scheduling mode is supported on AMD GPUs.
27+
- `hipDeviceAttributeDmaBufSupported` is now supported, enabling buffer sharing.
2028

2129
### Resolved issues
2230

2331
* An error that occurred during HIP graph stream capture in thread‑local capture mode has been fixed. The HIP runtime now updates its validation logic to ensure that captures running in other threads on different streams no longer invalidate or block the thread‑local capture in the current thread.
32+
* A segmentation fault that occurred during HIP graph capture. The HIP runtime has updated its large‑graph handling mechanism to prevent stack overflow.
33+
* Incorrect return codes from `hipEventQuery` and `hipEventSynchronize` when invoked under mixed stream‑capture modes. The HIP runtime now correctly handles capture‑mode restrictions for event operations.
34+
* A segmentation fault that occurred when retrieving an allocation handle with `hipMemRetainAllocationHandle`. The HIP runtime now correctly retains the generic allocation object to prevent memory‑management issues.
35+
* Resolved a graph node scheduling issue in multistream execution that, in some cases, led to unnecessary kernel‑execution stalls.
2436

2537
### Optimized
2638

2739
* HIP log-level control capabilities HIP runtime adds dynamic logging functionalities, enabling applications to programmatically enable, disable, and configure logging at runtime without modifying environment variables or restarting the application. The result is more precise control over diagnostic output, making it easier to debug targeted code paths or minimize log noise during performance‑critical execution.
40+
* HIP Graph Segmented Execution: Graph nodes are grouped into segments and dispatched across multiple GPU streams to enable parallel execution.
41+
- Batching: Each stream receives a single `AccumulateCommand` that aggregates all kernel dispatches and submits them efficiently as one batch.
42+
- Synchronization: When a segment depends on work running on another stream, a hardware wait is inserted. At completion, all parallel streams synchronize back to the launch stream.
43+
- Signaling: Segments emit hardware signals only when downstream segments require them—typically at fork points or when executing in parallel with other segments.
44+
45+
This approach reduces dispatch overhead and improves GPU utilization by overlapping independent graph work across streams while preserving correct execution order.
46+
* Optimized graph stream synchronization by eliminating duplicate marker creation when syncing streams back to the launch stream. The runtime now tracks synchronized dependency segments to avoid redundant synchronization markers.
47+
* Optimized `hipMemcpyBatchAsync` with refactored code, new data structures, and an improved core implementation for better performance.
2848

2949
## HIP 7.11 for ROCm 7.11
3050

0 commit comments

Comments
 (0)