Skip to content

Conversation

@igchor
Copy link
Contributor

@igchor igchor commented Mar 28, 2025

The implementation incorrectly assumed that hDevice is always set for memory type other than host which resulted in nullptr dereference in ur_discrete_buffer_handle_t ctor.

The assumption is not true for shared allocations.

Implement a separate handle type to handle shared allocations: the implementation will just use the allocation directly.

@igchor igchor requested a review from a team as a code owner March 28, 2025 00:55
@igchor igchor temporarily deployed to WindowsCILock March 28, 2025 00:56 — with GitHub Actions Inactive
@igchor igchor temporarily deployed to WindowsCILock March 28, 2025 01:16 — with GitHub Actions Inactive
@igchor igchor temporarily deployed to WindowsCILock March 28, 2025 01:16 — with GitHub Actions Inactive
The implementation incorrectly assumed that hDevice is always
set for memory type other than host which resulted in nullptr
dereference in ur_discrete_buffer_handle_t ctor.

The assumption is not true for shared allocations.

Implement a separate handle type to handle shared allocations:
the implementation will just use the allocation directly.
@igchor
Copy link
Contributor Author

igchor commented Mar 31, 2025

@intel/llvm-gatekeepers this is ready to be merged

@aelovikov-intel aelovikov-intel merged commit 5987b9a into intel:sycl Mar 31, 2025
41 of 43 checks passed
@sarnex
Copy link
Contributor

sarnex commented Mar 31, 2025

@igchor @npmiller @pbalcer I don't think it was this PR but any ideas what could cause this postcommit windows build fail?

2025-03-31T16:12:52.0244390Z FAILED: lib/coarse.lib 
2025-03-31T16:12:52.0245789Z C:\WINDOWS\system32\cmd.exe /C "cd . && C:\PROGRA~2\Intel\oneAPI\compiler\latest\bin\icx.exe /nologo _deps\unified-memory-framework-build\src\coarse\CMakeFiles\coarse.dir\coarse.c.obj _deps\unified-memory-framework-build\src\coarse\CMakeFiles\coarse.dir\__\__\__\unified-memory-framework-src\src\ravl\ravl.c.obj  -fuse-ld=llvm-lib -o lib\coarse.lib /Qoption,link,/machine:x64   && cd ."
2025-03-31T16:12:52.0254738Z _deps\unified-memory-framework-build\src\coarse\CMakeFiles\coarse.dir\coarse.c.obj: no such file or directory
2025-03-31T16:12:52.0255215Z 
2025-03-31T16:12:52.0255438Z icx: error: linker command failed with exit code 1 (use -v to see invocation)
2025-03-31T16:12:52.0255713Z 
2025-03-31T16:12:52.0256129Z [563/4963] Building CXX object lib\DebugInfo\CodeView\CMakeFiles\LLVMDebugInfoCodeView.dir\DebugSymbolsSubsection.cpp.obj
2025-03-31T16:12:52.0266390Z [564/4963] Building CXX object lib\DebugInfo\CodeView\CMakeFiles\LLVMDebugInfoCodeView.dir\DebugCrossImpSubsection.cpp.obj
2025-03-31T16:12:52.0281825Z [565/4963] Building CXX object lib\DebugInfo\CodeView\CMakeFiles\LLVMDebugInfoCodeView.dir\RecordName.cpp.obj
2025-03-31T16:12:52.0304209Z [566/4963] Building CXX object lib\DebugInfo\CodeView\CMakeFiles\LLVMDebugInfoCodeView.dir\CVTypeVisitor.cpp.obj
2025-03-31T16:12:52.0322986Z [567/4963] Building CXX object lib\DebugInfo\CodeView\CMakeFiles\LLVMDebugInfoCodeView.dir\MergingTypeTableBuilder.cpp.obj
2025-03-31T16:12:52.0339554Z [568/4963] Building RC object tools\llvm-foreach\CMakeFiles\llvm-foreach.dir\__\__\resources\windows_version_resource.rc.res
2025-03-31T16:12:52.0360274Z [569/4963] Building CXX object utils\TableGen\Basic\CMakeFiles\obj.LLVMTableGenBasic.dir\RISCVTargetDefEmitter.cpp.obj
2025-03-31T16:12:52.0384793Z [570/4963] Building CXX object tools\clang\utils\TableGen\CMakeFiles\clang-tblgen.dir\ClangCommentHTMLNamedCharacterReferenceEmitter.cpp.obj
2025-03-31T16:12:52.0398886Z [571/4963] Building CXX object lib\TableGen\CMakeFiles\LLVMTableGen.dir\JSONBackend.cpp.obj
2025-03-31T16:12:52.0411563Z [572/4963] Building CXX object lib\Extensions\CMakeFiles\LLVMExtensions.dir\Extensions.cpp.obj
2025-03-31T16:12:52.0428865Z [573/4963] Building CXX object tools\clang\utils\TableGen\CMakeFiles\clang-tblgen.dir\ASTTableGen.cpp.obj
2025-03-31T16:12:52.0438093Z [574/4963] Linking C static library lib\hwloc.lib
2025-03-31T16:12:52.0438461Z FAILED: lib/hwloc.lib 
2025-03-31T16:12:52.0455282Z C:\WINDOWS\system32\cmd.exe /C "cd . && C:\PROGRA~2\Intel\oneAPI\compiler\latest\bin\icx.exe /nologo _deps\hwloc_targ-build\CMakeFiles\hwloc.dir\__\hwloc_targ-src\hwloc\topology.c.obj _deps\hwloc_targ-build\CMakeFiles\hwloc.dir\__\hwloc_targ-src\hwloc\traversal.c.obj _deps\hwloc_targ-build\CMakeFiles\hwloc.dir\__\hwloc_targ-src\hwloc\distances.c.obj _deps\hwloc_targ-build\CMakeFiles\hwloc.dir\__\hwloc_targ-src\hwloc\memattrs.c.obj _deps\hwloc_targ-build\CMakeFiles\hwloc.dir\__\hwloc_targ-src\hwloc\cpukinds.c.obj _deps\hwloc_targ-build\CMakeFiles\hwloc.dir\__\hwloc_targ-src\hwloc\components.c.obj _deps\hwloc_targ-build\CMakeFiles\hwloc.dir\__\hwloc_targ-src\hwloc\bind.c.obj _deps\hwloc_targ-build\CMakeFiles\hwloc.dir\__\hwloc_targ-src\hwloc\bitmap.c.obj _deps\hwloc_targ-build\CMakeFiles\hwloc.dir\__\hwloc_targ-src\hwloc\pci-common.c.obj _deps\hwloc_targ-build\CMakeFiles\hwloc.dir\__\hwloc_targ-src\hwloc\diff.c.obj _deps\hwloc_targ-build\CMakeFiles\hwloc.dir\__\hwloc_targ-src\hwloc\shmem.c.obj _deps\hwloc_targ-build\CMakeFiles\hwloc.dir\__\hwloc_targ-src\hwloc\misc.c.obj _deps\hwloc_targ-build\CMakeFiles\hwloc.dir\__\hwloc_targ-src\hwloc\base64.c.obj _deps\hwloc_targ-build\CMakeFiles\hwloc.dir\__\hwloc_targ-src\hwloc\topology-noos.c.obj _deps\hwloc_targ-build\CMakeFiles\hwloc.dir\__\hwloc_targ-src\hwloc\topology-synthetic.c.obj _deps\hwloc_targ-build\CMakeFiles\hwloc.dir\__\hwloc_targ-src\hwloc\topology-xml.c.obj _deps\hwloc_targ-build\CMakeFiles\hwloc.dir\__\hwloc_targ-src\hwloc\topology-xml-nolibxml.c.obj _deps\hwloc_targ-build\CMakeFiles\hwloc.dir\__\hwloc_targ-src\hwloc\topology-windows.c.obj _deps\hwloc_targ-build\CMakeFiles\hwloc.dir\__\hwloc_targ-src\hwloc\topology-x86.c.obj  -fuse-ld=llvm-lib -o lib\hwloc.lib /Qoption,link,/machine:x64   && cd ."
2025-03-31T16:12:52.0461426Z _deps\hwloc_targ-build\CMakeFiles\hwloc.dir\__\hwloc_targ-src\hwloc\topology.c.obj: no such file or directory
2025-03-31T16:12:52.0461851Z 
2025-03-31T16:12:52.0462073Z icx: error: linker command failed with exit code 1 (use -v to see invocation)

https://github.com/intel/llvm/actions/runs/14175023208/job/39707709728

@igchor
Copy link
Contributor Author

igchor commented Mar 31, 2025

@sarnex hm, this looks like a problem with UMF, but the UMF version was bumped a while ago, and there were no failures on merge. @lukaszstolarczuk @PatKamin have you seen this error before?

@igchor
Copy link
Contributor Author

igchor commented Mar 31, 2025

@sarnex looking at the logs, I also see some failures during level-zero compilation. Also, there are some CMake warning that weren't present before:

Policy CMP0175 is not set: add_custom_command() rejects invalid arguments.
  Run "cmake --help-policy CMP0175" for policy details.  Use the cmake_policy
  command to set the policy and suppress this warning.

Perhaps there was some infrastructure update that caused this?

@igchor igchor deleted the fix_v2_mem_with_native branch March 31, 2025 17:32
@sarnex
Copy link
Contributor

sarnex commented Mar 31, 2025

@igchor Can you send me a log from where it passed?

@igchor
Copy link
Contributor Author

igchor commented Mar 31, 2025

@sarnex I was looking at this job, just after UMF bump merge: https://github.com/intel/llvm/actions/runs/13990348416/job/39172707168

@sarnex
Copy link
Contributor

sarnex commented Mar 31, 2025

So I installed CUDA/HIP/OCL CPU on this runner but I can't imagine that's related and the logs don't seem to suggest so, I see

2025-03-31T16:05:09.1587685Z See "git help gc" for manual housekeeping.
2025-03-31T16:07:15.8357232Z error: bad packed object CRC for 15ad4cf46b6875c1b0a2a89c6521c63b9f84fe12
2025-03-31T16:07:15.8357973Z fatal: object 15ad4cf46b6875c1b0a2a89c6521c63b9f84fe12 cannot be read
2025-03-31T16:07:16.1134101Z fatal: failed to run repack
2025-03-31T16:07:16.1167737Z error: task 'gc' failed
2025-03-31T16:07:20.0686701Z Updating 5987b9af0410..9ff48ff91855
2025-03-31T16:07:20.0687168Z Fast-forward
2025-03-31T16:07:20.1123809Z Auto packing the repository for optimum performance.
2025-03-31T16:07:20.1124283Z See "git help gc" for manual housekeeping.
2025-03-31T16:09:15.7347739Z error: bad packed object CRC for 071dea7362b59e4d7576932d3455be02a3f8f804
2025-03-31T16:09:15.7358599Z fatal: object 071dea7362b59e4d7576932d3455be02a3f8f804 cannot be read
2025-03-31T16:09:16.0143526Z fatal: failed to run repack
2025-03-31T16:09:16.0180113Z error: task 'gc' failed

so maybe the git repo cache is corrupted, let me try deleting it

@igchor
Copy link
Contributor Author

igchor commented Mar 31, 2025

@sarnex Actually, this seems to be the last passing job: https://github.com/intel/llvm/actions/runs/14162041878/job/39668990919

@sarnex
Copy link
Contributor

sarnex commented Mar 31, 2025

that one ran on a runner that i didn't touch, so maybe it is just this runner, let me try some stuff

@sarnex
Copy link
Contributor

sarnex commented Mar 31, 2025

ok i see the bad case is using upstream cmake and the good is using cmake from msvc, let me try removing the upstream one

@sarnex
Copy link
Contributor

sarnex commented Mar 31, 2025

ok looks like the problem was a perl version bump i did on the runner, weird. anyway seems fixed now and not related to any actual code, sorry for the trouble

KornevNikita pushed a commit that referenced this pull request May 27, 2025
The implementation incorrectly assumed that hDevice is always set for
memory type other than host which resulted in nullptr dereference in
ur_discrete_buffer_handle_t ctor.

The assumption is not true for shared allocations.

Implement a separate handle type to handle shared allocations: the
implementation will just use the allocation directly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants