You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Motivation
Adding more changelog for 7.12 in LRT HIP.
## Technical Details
Summarized all features and bug fixes in 7.12 branch for LRT.
## JIRA ID
https://amd-hub.atlassian.net/browse/AIRUNTIME-52
## Test Plan
No need to test for document.
## Test Result
N/A
## Submission Checklist
Cover all features and bug fixes for 7.12 LRT.
Copy file name to clipboardExpand all lines: projects/clr/CHANGELOG.md
+23-3Lines changed: 23 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,24 +7,44 @@ Full documentation for HIP is available at [rocm.docs.amd.com](https://rocm.docs
7
7
### Added
8
8
9
9
* New HIP APIs
10
+
- Library Management
11
+
Support for the following APIs for parity with the corresponding CUDA APIs.
12
+
*`hipKernelSetAttribute` sets an attribute for a kernel
13
+
*`hipKernelGetAttribute` returns information about a kernel
14
+
*`hipKernelGetFunction` returns a function handle
15
+
- Memory Management
16
+
* Added support for `hipMipmappedArrayGetMemoryRequirements`, which returns memory requirements for HIP mipmapped arrays and ensures parity with CUDA APIs.
10
17
- Cooperative Groups
11
-
* Support for `barrier` APIs `barrier_arrive` and `barrier_wait` has been added for both `grid_group` and `thread_block` to enable finer‑grained synchronization within cooperative groups
18
+
* Support for `barrier` APIs `barrier_arrive` and `barrier_wait` has been added for both `grid_group` and `thread_block` to enable finer‑grained synchronization within cooperative groups.
12
19
* Support for `block_rank` in the class `grid_group`, returns the rank of the block in the calling thread
13
20
- Dynamic logging, no matching CUDA APIs exist
14
21
*`hipExtEnableLogging` enables HIP runtime logging
15
22
*`hipExtDisableLogging` disables HIP runtime logging
16
23
*`hipExtSetLoggingParams` sets HIP runtime logging parameters
17
24
18
-
* New HIP enumeration
19
-
-`hipDeviceAttributeExpertSchedMode` has been added to hipDeviceAttribute_t to indicate whether expert scheduling mode is supported on AMD GPUs
25
+
* New HIP device attributes
26
+
-`hipDeviceAttributeExpertSchedMode` has been added to hipDeviceAttribute_t to indicate whether expert scheduling mode is supported on AMD GPUs.
27
+
-`hipDeviceAttributeDmaBufSupported` is now supported, enabling buffer sharing.
20
28
21
29
### Resolved issues
22
30
23
31
* An error that occurred during HIP graph stream capture in thread‑local capture mode has been fixed. The HIP runtime now updates its validation logic to ensure that captures running in other threads on different streams no longer invalidate or block the thread‑local capture in the current thread.
32
+
* A segmentation fault that occurred during HIP graph capture. The HIP runtime has updated its large‑graph handling mechanism to prevent stack overflow.
33
+
* Incorrect return codes from `hipEventQuery` and `hipEventSynchronize` when invoked under mixed stream‑capture modes. The HIP runtime now correctly handles capture‑mode restrictions for event operations.
34
+
* A segmentation fault that occurred when retrieving an allocation handle with `hipMemRetainAllocationHandle`. The HIP runtime now correctly retains the generic allocation object to prevent memory‑management issues.
35
+
* Resolved a graph node scheduling issue in multistream execution that, in some cases, led to unnecessary kernel‑execution stalls.
24
36
25
37
### Optimized
26
38
27
39
* HIP log-level control capabilities HIP runtime adds dynamic logging functionalities, enabling applications to programmatically enable, disable, and configure logging at runtime without modifying environment variables or restarting the application. The result is more precise control over diagnostic output, making it easier to debug targeted code paths or minimize log noise during performance‑critical execution.
40
+
* HIP Graph Segmented Execution: Graph nodes are grouped into segments and dispatched across multiple GPU streams to enable parallel execution.
41
+
- Batching: Each stream receives a single `AccumulateCommand` that aggregates all kernel dispatches and submits them efficiently as one batch.
42
+
- Synchronization: When a segment depends on work running on another stream, a hardware wait is inserted. At completion, all parallel streams synchronize back to the launch stream.
43
+
- Signaling: Segments emit hardware signals only when downstream segments require them—typically at fork points or when executing in parallel with other segments.
44
+
45
+
This approach reduces dispatch overhead and improves GPU utilization by overlapping independent graph work across streams while preserving correct execution order.
46
+
* Optimized graph stream synchronization by eliminating duplicate marker creation when syncing streams back to the launch stream. The runtime now tracks synchronized dependency segments to avoid redundant synchronization markers.
47
+
* Optimized `hipMemcpyBatchAsync` with refactored code, new data structures, and an improved core implementation for better performance.
0 commit comments