alge: fix SIGSEGV in cs_sles_solve_ccc_fv with extended matrix columns#165
Open
diskdog wants to merge 1 commit into
Open
alge: fix SIGSEGV in cs_sles_solve_ccc_fv with extended matrix columns#165diskdog wants to merge 1 commit into
diskdog wants to merge 1 commit into
Conversation
When n_cols_ext > n_cells_with_ghosts, cs_sles_solve_ccc_fv allocates extended _vx and _rhs buffers using cs_alloc_mode_device, which resolves to CS_ALLOC_DEVICE (device-only) in a standard CUDA build. Those pointers are then passed into cs_sles_solve, which reads the residual on the host during convergence checking. The result is a SIGSEGV. The fix is to use CS_ALLOC_HOST_DEVICE_SHARED for these two buffers. They are not pure-device scratch; the solver needs host-readable convergence data from them. The GPU dispatch is unaffected: ctx still runs on the GPU via set_use_gpu(true), and the unified-memory backing is fast enough on all tested sm_7x+ devices. The existing workaround (CS_CUDA_ALLOC_DEVICE_UVM=1) happens to fix this by globally remapping cs_alloc_mode_device, but the global remap affects unrelated allocations and masks the root cause here. Tested on sm_75, CUDA 13.1, channel-flow case with CS_MATRIX_NATIVE.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When n_cols_ext > n_cells_with_ghosts, cs_sles_solve_ccc_fv allocates
extended _vx and _rhs buffers using cs_alloc_mode_device, which resolves
to CS_ALLOC_DEVICE (device-only) in a standard CUDA build. Those pointers
are then passed into cs_sles_solve, which reads the residual on the host
during convergence checking. The result is a SIGSEGV.
The fix is to use CS_ALLOC_HOST_DEVICE_SHARED for these two buffers.
They are not pure-device scratch; the solver needs host-readable
convergence data from them. The GPU dispatch is unaffected: ctx still
runs on the GPU via set_use_gpu(true), and the unified-memory backing
is fast enough on all tested sm_7x+ devices.
The existing workaround (CS_CUDA_ALLOC_DEVICE_UVM=1) happens to fix
this by globally remapping cs_alloc_mode_device, but the global remap
affects unrelated allocations and masks the root cause here.
Tested on sm_75, CUDA 13.1, channel-flow case with CS_MATRIX_NATIVE.
Fixes #164.