Skip to content

[rocprofiler-sdk] Optimize HSA queue write interceptor and async signal handler#4276

Open
jrmadsen wants to merge 5 commits intodevelopfrom
users/jrmadsen/optimize-hsa-write-interceptor
Open

[rocprofiler-sdk] Optimize HSA queue write interceptor and async signal handler#4276
jrmadsen wants to merge 5 commits intodevelopfrom
users/jrmadsen/optimize-hsa-write-interceptor

Conversation

@jrmadsen
Copy link
Contributor

Motivation

Rewrites how rocprofiler-sdk handles the signal creation and signal async handlers in queue interception.

Technical Details

  • Creates an initial batch of 4096 signals and creates new batches of 4096 as needed.
  • Only assigns async signal handler to last packet in a batch of packets

JIRA ID

Test Plan

  • Ideally, this just improves performance and any breakages will be detected in the existing tests.
  • Developing a test to prevent a performance regression will difficult.

Test Result

Submission Checklist

@jrmadsen jrmadsen requested review from a team as code owners March 20, 2026 23:37
Copilot AI review requested due to automatic review settings March 20, 2026 23:37
@jrmadsen jrmadsen changed the title Users/jrmadsen/optimize hsa write interceptor [rocprofiler-sdk] Optimize HSA queue write interceptor and async signal handler Mar 20, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors rocprofiler-sdk’s HSA queue write interception and async signal handling to reduce per-dispatch overhead by batching per-packet state and introducing pooled/batched HSA signals.

Changes:

  • Introduces packet_data_t and updates completion callbacks to operate on per-packet data rather than session-wide fields.
  • Adds a pooled signal infrastructure (pool/pool_object) and rewires queue interception to allocate/reuse signals in batches.
  • Adds a new HIP test binary (hip-graph-bubbles) intended to create many graph-based kernel dispatches.

Reviewed changes

Copilot reviewed 31 out of 34 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
projects/rocprofiler-sdk/tests/bin/hip-graph-bubbles/hip-graph-bubbles.cpp New test program that builds/launches a HIP graph repeatedly with roctx ranges.
projects/rocprofiler-sdk/tests/bin/hip-graph-bubbles/CMakeLists.txt Build rules for the new hip-graph-bubbles test binary.
projects/rocprofiler-sdk/tests/bin/CMakeLists.txt Adds hip-graph-bubbles subdirectory to the test build.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tracing/fwd.hpp Changes external correlation map type to a small_vector-backed container.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/thread_trace/core.hpp Updates post_kernel_call signature to take packet_data_t.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/thread_trace/core.cpp Threads packet_data_t.user_data through post-dispatch data iteration.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/pc_sampling/tests/pc_sampling_internals.hpp Updates session type name references for completion callback signatures.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp Adapts to renamed session type and small_vector external correlation map.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/kernel_dispatch/tracing.hpp Updates dispatch tracing APIs to use queue_info_session_t + packet_data_t.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/kernel_dispatch/tracing.cpp Moves dispatch callback inputs from session-wide to per-packet storage.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/hsa/signal.hpp Adds signal_t wrapper used by pooled signal objects.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/hsa/queue_info_session.hpp Introduces packet_data_t and refactors session to hold a small_vector of packet data.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/hsa/queue.hpp Updates async completion callback signature and adds pooled-signal APIs.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/hsa/queue.cpp Core refactor: batching packet data, pooled signals, and async handler changes.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/hsa/memory_allocation.cpp Switches external correlation map alias to the new small_vector-backed type.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/hsa/async_copy.cpp Switches external correlation map alias to the new small_vector-backed type.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/hsa/CMakeLists.txt Adds signal.hpp to installed/compiled HSA headers list.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/tests/core.cpp Updates tests for renamed session type and new completed_cb signature.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/sample_processing.hpp Plumbs packet_data_t into callback processing params.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/sample_processing.cpp Reads dispatch info/user_data/external corr IDs from packet_data_t.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/dispatch_handlers.hpp Updates completed callback signature to include packet_data_t.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/dispatch_handlers.cpp Passes packet_data_t through to sample processing.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/core.cpp Updates controller callback wiring for new completed callback signature.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/code_object/code_object.cpp Switches external correlation map alias to the new small_vector-backed type.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/buffer.cpp Optimizes get_buffer lookup from linear scan to direct indexing.
projects/rocprofiler-sdk/source/lib/common/utility.hpp Generalizes get_val to work with containers providing find (incl. small_vector pairs).
projects/rocprofiler-sdk/source/lib/common/mpl.hpp Extends pair detection trait to expose first_type / second_type.
projects/rocprofiler-sdk/source/lib/common/container/static_vector.hpp Simplifies emplace_back assignment path.
projects/rocprofiler-sdk/source/lib/common/container/stable_vector.hpp Initializes members to defaults to avoid uninitialized state.
projects/rocprofiler-sdk/source/lib/common/container/small_vector.hpp Adds map-like helpers for small_vector-of-pairs (find, at, emplace).
projects/rocprofiler-sdk/source/lib/common/container/record_header_buffer.cpp Uses memset to clear only the used header range; adds <cstring>.
projects/rocprofiler-sdk/source/lib/common/container/pool_object.hpp New pooled object wrapper with acquire/release semantics.
projects/rocprofiler-sdk/source/lib/common/container/pool.hpp New pool implementation for batched reusable objects (used for signals).
projects/rocprofiler-sdk/source/lib/common/container/CMakeLists.txt Adds new pool headers to the common container header list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +854 to +864
if(auto* pool = get_signal_pool(); use_pool && pool && attribute == 0)
{
auto& _signal = pool->acquire(construct_hsa_signal, 0, 0, nullptr, attribute);
ROCP_FATAL_IF(!_signal.in_use()) << "Acquired signal from pool that is not in use";
*signal = _signal.get().value;
// ROCP_INFO << fmt::format("acquired signal {} from pool: hsa_signal_t{{.handle={}}}",
// _signal.index(),
// _signal.get().value.handle);
get_core_table()->hsa_signal_store_screlease_fn(_signal.get().value, 1);
return &_signal;
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pooled-signal path calls pool->acquire(construct_hsa_signal, ...), which creates a brand new HSA signal each time the pooled object is acquired. That overwrites the previous hsa_signal_t handle stored in the pool object without destroying it, effectively leaking HSA signals and defeating reuse.

For pooled signals, create the HSA signal once when the pool batch is constructed and only reset its value on reuse (e.g., via hsa_signal_store_*), or explicitly destroy the previous handle before re-creating it.

Copilot uses AI. Check for mistakes.
Comment on lines +305 to 309
ROCP_INFO << fmt::format("WriteInterceptor called with pkt_count={}", pkt_count);

using callback_record_t = packet_data_t::callback_record_t;
using packet_vector_t = common::container::small_vector<rocprofiler_packet, 512>;

Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ROCP_INFO logging inside WriteInterceptor will run on every intercepted queue write and can add significant overhead / log spam in performance-sensitive code paths.

Consider removing this log or demoting it to ROCP_TRACE / VLOG guarded by a debug flag.

Copilot uses AI. Check for mistakes.
Comment on lines +884 to +886
ROCP_INFO << fmt::format("released signal {}: hsa_signal_t{{.handle={}}}",
signal->index(),
signal->get().value.handle);
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Queue::release_signal emits an ROCP_INFO log for every pooled signal release. In the new batching design this can be thousands of logs per interceptor call and may negate the intended performance improvement.

Consider removing this log, lowering it to ROCP_TRACE, or guarding it behind a verbose/debug option.

Suggested change
ROCP_INFO << fmt::format("released signal {}: hsa_signal_t{{.handle={}}}",
signal->index(),
signal->get().value.handle);
ROCP_TRACE << fmt::format("released signal {}: hsa_signal_t{{.handle={}}}",
signal->index(),
signal->get().value.handle);

Copilot uses AI. Check for mistakes.
Comment on lines +915 to 917

if(get_signal_pool()) get_signal_pool()->report_reuse();
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pool::report_reuse() logs at WARNING level and Queue::sync() calls it unconditionally. This will likely produce warnings during normal operation and can be noisy.

Consider lowering this to ROCP_TRACE/VLOG, or only reporting reuse under an explicit diagnostics flag.

Copilot uses AI. Check for mistakes.
queue_info_session.kernel_pkt.ext_amd_aql_pm4.completion_signal);
}
hsa::get_core_table()->hsa_signal_store_screlease_fn(packet.interrupt_signal, -1);
ROCP_FATAL << "Destroying interrupt signal";
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ROCP_FATAL << "Destroying interrupt signal" will unconditionally terminate the process whenever an interrupt signal is present. This looks like leftover debug logging and prevents normal cleanup.

Switch this to a non-fatal log (or remove it) so the interrupt signal can be destroyed without aborting.

Suggested change
ROCP_FATAL << "Destroying interrupt signal";
ROCP_DEBUG << "Destroying interrupt signal";

Copilot uses AI. Check for mistakes.
Comment on lines 263 to 283
template <typename Integral>
constexpr Integral
bit_extract(Integral x, int first, int last)
{
return (x >> first) & bit_mask<Integral>(0, last - first);
static_assert(std::is_integral<Integral>::value, "Integral type required");

auto&& bit_mask = [](int _first, int _last) {
ROCP_FATAL_IF(!(_last >= _first)) << fmt::format(
"[queue::bit_extract::bit_mask] -> invalid argument. last (={}) is not >= first (={})",
_last,
_first);

size_t num_bits = _last - _first + 1;
return ((num_bits >= sizeof(Integral) * 8) ? ~Integral{0}
/* num_bits exceed the size of Integral */
: ((Integral{1} << num_bits) - 1))
<< _first;
};

return (x >> first) & bit_mask(0, last - first);
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bit_extract is declared constexpr but its body calls ROCP_FATAL_IF / fmt::format via the bit_mask lambda. That makes the function non-constexpr-suitable and can cause compilation errors.

Either drop the constexpr qualifier here, or rework this into a truly constexpr-safe implementation without runtime logging.

Copilot uses AI. Check for mistakes.
bool expected = true;
auto val = m_in_use.compare_exchange_strong(expected, false);

if(m_pool) m_pool->release(m_index);
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pool_object::release() calls m_pool->release(m_index) even when the compare_exchange_strong fails (i.e., the object wasn't actually in use). That can push the same index into the available queue multiple times and lead to multiple threads acquiring the same pooled object.

Only return the index to the pool when the state transition from in-use -> free succeeds.

Suggested change
if(m_pool) m_pool->release(m_index);
if(val && m_pool) m_pool->release(m_index);

Copilot uses AI. Check for mistakes.
Comment on lines +146 to +147
auto _read_lk = std::shared_lock<std::shared_mutex>{m_available_mtx};
auto& _obj = m_pool.at(_idx.value());
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pool::acquire() reads from m_pool without taking m_pool_mtx (it only holds m_available_mtx). Another thread can enter the "add a new batch" path and mutate m_pool under m_pool_mtx, causing a data race / invalid reference.

Take a shared lock on m_pool_mtx when accessing m_pool (and similarly in release() when calling m_pool.at(idx)), so reads and batch growth are properly synchronized.

Suggested change
auto _read_lk = std::shared_lock<std::shared_mutex>{m_available_mtx};
auto& _obj = m_pool.at(_idx.value());
// Protect read access to m_pool with a shared lock on m_pool_mtx
auto _pool_read_lk = std::shared_lock<std::shared_mutex>{m_pool_mtx};
auto& _obj = m_pool.at(_idx.value());

Copilot uses AI. Check for mistakes.
Comment on lines +909 to +913
_core_api.hsa_signal_wait_relaxed_fn(_active_kernels,
HSA_SIGNAL_CONDITION_EQ,
0,
timeout_hint.count(),
HSA_WAIT_STATE_BLOCKED);
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Queue::sync() now waits with a 1s timeout and ignores the return value from hsa_signal_wait_relaxed_fn. If the wait times out, sync() will return while kernels are still active, and the destructor may proceed to destroy resources prematurely.

Use an infinite/very large timeout (as before) or loop until the signal reaches the expected value, and handle the timeout/error case explicitly.

Suggested change
_core_api.hsa_signal_wait_relaxed_fn(_active_kernels,
HSA_SIGNAL_CONDITION_EQ,
0,
timeout_hint.count(),
HSA_WAIT_STATE_BLOCKED);
while(true)
{
const auto current_value = _core_api.hsa_signal_wait_relaxed_fn(
_active_kernels,
HSA_SIGNAL_CONDITION_EQ,
0,
timeout_hint.count(),
HSA_WAIT_STATE_BLOCKED);
// hsa_signal_wait_relaxed_fn returns the signal value. If it is zero,
// the condition HSA_SIGNAL_CONDITION_EQ 0 has been satisfied and all
// active kernels have completed.
if(current_value == 0)
{
break;
}
// Timeout or early wakeup occurred while kernels are still active.
// Log and continue waiting to avoid destroying resources prematurely.
ROCP_WARNING << fmt::format(
"Timeout while waiting for queue sync: {} kernels still active", current_value);
}

Copilot uses AI. Check for mistakes.
@powderluv
Copy link

I pulled this PR into a clean local worktree and tried it against the same HIP graph kernel-trace repro cases we have been using for the queue/signal issue.

One build caveat first: on my ROCm 7.13 / TheRock venv, the PR head (9113c23e9d) does not build as-is because it is missing the separate fmt.hpp memory-copy-op compatibility fix for ROCm 7.13. I applied only that minimal compatibility patch locally, with no queue/signal behavior changes on top of the PR, so I could test the runtime behavior.

With that single compatibility patch added, I still could not get the PR branch to pass the HIP graph repro:

  • 1000 x 300 with --kernel-trace segfaulted very early, before any CSV output was written.
  • 256 x 200 with --kernel-trace also segfaulted before any CSV output was written.

I put the exact compatibility patch and the two crash logs into a secret gist here:

https://gist.github.com/powderluv/f65f4560fe338effd090fd7dd57d833d

Files in the gist:

  • README.md
  • pr4276_rocm713_compat.patch
  • pr4276_k1000_i300_run.log
  • pr4276_k256_i200_run.log

So at least on this setup, this alternative implementation is not yet passing the existing HIP graph test cases.

@powderluv
Copy link

I pulled this into a clean workspace and iterated on top of the PR head locally. The updated branch is here:

Local commit stack on top of the PR branch:

  • d33b45eda3 rocprofiler-sdk: handle ROCm 7.13 memory copy op layouts
  • d88004b100 rocprofiler-sdk: avoid host-thread state on async queue callbacks
  • 5e6cd16418 rocprofiler-sdk: prearm queue completion callbacks for hip graphs

What changed at a high level:

  • stopped using host-thread-only state on ROCr async-doorbell callbacks
  • skipped tool-side kernel rename / HIP stream external-correlation setup when there is no host correlation id
  • switched the queue completion path to fresh one-shot pre-armed slots
  • kept pre-armed handlers alive until a real queue session is attached
  • changed the queue completion async-handler condition from EQ -1 to LT 1, which was the turning point for actually draining dispatch completions on this HIP graph case

Validation on the HIP graph reproducer (--kernel-trace, CSV output):

  • 256 x 20: passes, 5120 rows / 5120 unique dispatch ids
  • 256 x 200: passes, 51200 rows / 51200 unique dispatch ids
  • 1000 x 200: passes, 200000 rows / 200000 unique dispatch ids
  • 2000 x 200: passes, 400000 rows / 400000 unique dispatch ids

This is materially different from the original state I tested earlier on this machine, where the branch either failed to build on the ROCm 7.13 venv or segfaulted / failed to emit profiler output on the same HIP graph kernel-trace cases.

@powderluv
Copy link

I added a local hotspot pass on the current pr4276-based branch using the HIP graph reproducer with queue-signal timing enabled.

Method:

  • staged local rocprofv3 from the current pr4276 workspace
  • ROCPROFILER_QUEUE_SIGNAL_TRACE=1
  • ROCPROFILER_QUEUE_SIGNAL_TRACE_PERIOD=65536
  • compared the first ~65536 traced-dispatch summary on two shapes:
    • 2000 x 300
    • 3000 x 200

The main result is that the async completion callback is not the dominant performance hotspot.

At the first summary window:

  • 2000 x 300
    • dispatch_setup_avg_us=32.306
    • completion_avg_us=1.265
    • create_avg_us=0.641
    • register_avg_us=1.262
    • enqueue_latency_avg_us=4152.152
    • direct_create_calls=24406
  • 3000 x 200
    • dispatch_setup_avg_us=33.673
    • completion_avg_us=1.384
    • create_avg_us=0.696
    • register_avg_us=1.202
    • enqueue_latency_avg_us=4457.345
    • direct_create_calls=23946

Interpretation:

  • enqueue-side WriteInterceptor work is roughly 24x-26x larger than the async completion callback work
  • hsa_amd_signal_create and hsa_amd_signal_async_handler are visible, but neither is the dominant cost by itself
  • callback subphases are small:
    • get_dispatch_avg_us ~ 0.116-0.125
    • dispatch_complete_avg_us ~ 0.367-0.406
    • callback_avg_us ~ 0.070-0.072
  • the queue is still accumulating noticeable completion lag (enqueue_latency_avg_us ~ 4.1-4.5 ms), but the direct callback body is not expensive enough to explain the overall slowdown
  • the prearmed slot path still falls back to direct creates frequently (~24k misses in the first ~65k dispatches), so slot availability is still part of the picture

The next useful step is finer instrumentation inside WriteInterceptor itself, especially around:

  • correlation / external-correlation work
  • tracing enter/exit callback overhead
  • queue callback fanout on enqueue
  • packet transformation / serialization path
  • slot-acquire miss path versus ready-slot hit path

So the current evidence says: optimize enqueue-side setup first, not async callback execution.

@powderluv
Copy link

Follow-up hotspot note from a second local instrumentation pass on the HIP graph repro.

I split the enqueue-side dispatch_setup_avg_us bucket into non-overlapping pieces on the current local pr4276 worktree and sampled the first ~65536 traced dispatches of two shapes:

  • 2000 x 300

    • dispatch_setup_avg_us=45.849
    • dispatch_packet_avg_us=0.180
    • dispatch_signal_avg_us=44.611
      • dispatch_signal_create_avg_us=44.392
      • dispatch_signal_arm_avg_us=0.219
    • completion_avg_us=0.849
    • enqueue_latency_avg_us=5814.085
    • direct_create_calls=8834 / 65532
  • 3000 x 200

    • dispatch_setup_avg_us=54.235
    • dispatch_packet_avg_us=0.488
    • dispatch_signal_avg_us=52.278
      • dispatch_signal_create_avg_us=52.044
      • dispatch_signal_arm_avg_us=0.233
    • completion_avg_us=1.618
    • enqueue_latency_avg_us=6893.844
    • direct_create_calls=1064 / 65521

Takeaway:

  • The main enqueue-side hotspot is the completion-signal acquisition / creation stage in WriteInterceptor, not packet building and not async-handler arm/register.
  • Packet build is sub-0.5 us here.
  • Arm/register is only about 0.22-0.23 us.
  • Completion callback work is still small (<2 us).
  • The wider graph shape (3000x200) is slower mainly because the signal-create/acquire stage grows, and enqueue latency grows with it.

One nuance: the raw create_avg_us counter for hsa_amd_signal_create itself is still sub-1 us, so this larger dispatch_signal_create_avg_us bucket is measuring the broader completion-signal acquisition path, not just the raw runtime call in isolation. That points more toward ready-slot acquisition / fallback / surrounding queue bookkeeping than the async callback path.

@powderluv
Copy link

Follow-up after cleaning up the local diff and updating the comparison branch.

I pushed a cleaned queue-only commit on top of users/powderluv/pr4276-hip-graph-fix:

  • b0db72c610 rocprofiler-sdk: use a ready queue for prearmed signals

What changed in this cleanup:

  • kept only the ready-queue optimization for prearmed completion slots
  • dropped the temporary hotspot instrumentation
  • kept the header-side async_signal_* type placement needed for a clean rebuild in this branch layout

Validated from a clean rebuild/stage in the venv-backed environment at:

  • /data/anush/github/bubble/SWDEV-583475/stage/rocprofiler-sdk-pr4276-push

Wide HIP graph kernel-trace reruns on the cleaned stage:

  • 3000 x 200: passed on rerun, full CSV written
    • log: /data/anush/github/bubble/SWDEV-583475/logs/hip-graph-cleanpush-k3000-i200-rerun-20260322T080944Z/run.log
    • csv: /data/anush/github/bubble/SWDEV-583475/profiles/hip-graph-cleanpush-k3000-i200-rerun-20260322T080944Z/rocprofv3/trace_kernel_trace.csv
    • result: 600000 rows / 600000 unique Dispatch_Id
  • 2000 x 300: passed, full CSV written
    • log: /data/anush/github/bubble/SWDEV-583475/logs/hip-graph-cleanpush-k2000-i300-20260322T080957Z/run.log
    • csv: /data/anush/github/bubble/SWDEV-583475/profiles/hip-graph-cleanpush-k2000-i300-20260322T080957Z/rocprofv3/trace_kernel_trace.csv
    • result: 600000 rows / 600000 unique Dispatch_Id

One caveat: the first fresh 3000 x 200 run after the clean rebuild hit a one-off hip::stream::get_stream_id segfault:

  • /data/anush/github/bubble/SWDEV-583475/logs/hip-graph-cleanpush-k3000-i200-20260322T080840Z/run.log

That fault did not reproduce on the immediate rerun above, and the second wide case also passed. So the ready-queue throughput fix is on the branch now, but there is still some residual instability outside the queue ready-queue path that may need a separate follow-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants