Skip to content

Draft: Support Win32 long path#1061

Open
t-mat wants to merge 10 commits intoCyan4973:devfrom
t-mat:win32-long-path
Open

Draft: Support Win32 long path#1061
t-mat wants to merge 10 commits intoCyan4973:devfrom
t-mat:win32-long-path

Conversation

@t-mat
Copy link
Copy Markdown
Contributor

@t-mat t-mat commented Aug 21, 2025

This PR fixes #1060.

  • Add new function XSUM_widenStringAsUncPath() in cli/xsum_os_specific.c.
    • XSUM_fopen() and XSUM_stat() use this new function.
    • It always returns absolute UNC path with \\?\ prefix.
      • UTF-8 path is converted to wchar_t path.
      • Relative path is converted to absolute path with \\?\ prefix.
    • \\?\ prefix automatically avoids classic DOS device quirks (e.g. COM1, NUL, etc)
      • I think the \\?\ prefix is supported from Windows 2000. But I can't find any source from Microsoft :(
  • Add new test scripts in tests/windows/

t-mat added 3 commits August 21, 2025 23:14
Add XSUM_widenStringAsUncPath which converts UTF-8 path to absolute UNC path with "\\?\" prefix.
@t-mat
Copy link
Copy Markdown
Contributor Author

t-mat commented Aug 21, 2025

For some reason, GH actions for MinGW fails at the following line of the Makefile

xxHash/Makefile

Line 207 in fb93aa4

$(RUN_ENV) ./xxhsum$(EXT) xxhash.*

Error log

OK. (passes 49948 tests)
5ffb01494ce73724  stdin
Error: unable to open input
Error: Could not open 'xxhash.c': No such file or directory. 
Error: unable to open input
Error: Could not open 'xxhash.h': No such file or directory. 
mingw32-make: *** [Makefile:207: check] Error 1
Error: Process completed with exit code 2.

@t-mat
Copy link
Copy Markdown
Contributor Author

t-mat commented Aug 21, 2025

I can't reproduce above error in my local msys environment. But here is a different issue from my local environment:

# CFLAGS=-Werror make -C tests/collisions check

Testing xxh3 algorithm (64-bit)
This program will allocate a lot of memory,
generate 3200000 64-bit hashes from samples of 256 bytes,
and attempt to produce 0 collisions.

 Creating filter (0 GB)
 Generate 3200000 hashes from samples of 256 bytes
 Generation and filter completed in 0s , detected up to 240270 candidates
 Storing hash candidates (1 MB)
assertion "checkPtr == hashCandidates" failed: file "main.c", line 772, function: search_collisions

Is this line fine?

assert(checkPtr == hashCandidates); /* simplification: since we are reducing the size,

Allocation size of hashCandidates is tableSize.

size_t const tableSize = (size_t)((maxNbH+1) * hashByteSize);
...
void* const hashCandidates = malloc(tableSize);

But size for realloc() is nbCandidates * hashByteSize.

if (nbCandidates < maxNbH) {
  ...
  void* const checkPtr = realloc(hashCandidates, nbCandidates * hashByteSize);
  ...
  assert(checkPtr == hashCandidates);  /* simplification: since we are reducing the size, ... */
}

Since nbCandidates < maxNbH, argument of relloc() is always less than its original allocation size.
It means realloc() may or may not reallocate (move) its memory region (or not).
Therefore, after this realloc(), we should not use hashCandidates.

Also, when I put printf("%p\n", hashCandidates) for debug, gcc-15 reports the following warning/error.

error: pointer 'hashCandidates' may be used after 'realloc' [-Werror=use-after-free]

@29039
Copy link
Copy Markdown

29039 commented Aug 22, 2025

This is great but also I think that Windows requires that there specifically be a manifest in the exe file. Looking into it, I think that we can simply make a file called xxhsum.manifest alongside the xxhsum.c and then cmake will automatically pick that up and link it in: https://cmake.org/cmake/help/v3.4/release/3.4.html#other

Suggested xxhsum.manifest:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0" xmlns:asmv3="urn:schemas-microsoft-com:asm.v3">
  <asmv3:application>
    <windowsSettings xmlns:ws2="http://schemas.microsoft.com/SMI/2016/WindowsSettings">
        <ws2:longPathAware>true</ws2:longPathAware>
    </windowsSettings>
  </asmv3:application>
  <trustInfo xmlns="urn:schemas-microsoft-com:asm.v3">
    <security>
      <requestedPrivileges>
        <requestedExecutionLevel level="asInvoker"/>
      </requestedPrivileges>
    </security>
  </trustInfo>
  <compatibility xmlns="urn:schemas-microsoft-com:compatibility.v1">
    <application>
      <!--The ID below indicates application support for Windows Vista -->
      <supportedOS Id="{e2011457-1546-43c5-a5fe-008deee3d3f0}"/>
      <!--The ID below indicates application support for Windows 7 -->
      <supportedOS Id="{35138b9a-5d96-4fbd-8e2d-a2440225f93a}"/>
      <!--The ID below indicates application support for Windows 8 -->
      <supportedOS Id="{4a2f28e3-53b9-4441-ba9c-d69d4a4a6e38}"/>
      <!--The ID below indicates application support for Windows 8.1 -->
      <supportedOS Id="{1f676c76-80e1-4239-95bb-83d0f6d0da78}"/> 
      <!--The ID below indicates application support for Windows 10 -->
      <supportedOS Id="{8e0f7a12-bfb3-4fe8-b9a5-48fd50a15a9a}"/> 
    </application>
  </compatibility>
</assembly>

@t-mat
Copy link
Copy Markdown
Contributor Author

t-mat commented Aug 22, 2025

@29039 While a manifest file is convenient, my goal is to provide broader compatibility.
For example, I’d like to support “bad” Win32 filenames such as my_file. (filename with a trailing period). Since Linux/WSL (and NTFS) accepts this as a valid filename, broader compatibility could benefit other users as well. (My current implementation is incorrect, so it doesn’t work yet.)

Please also note that this patch works correctly even without a manifest file.

t-mat added 2 commits August 23, 2025 00:18
Note: So far it contains #pragma comment(lib) and pathcch.lib which only supports Windows 8+ or Windows Server 2012+.  This issue must be fixed.

This function supports 3 types of absolute path
- Classic DOS path (C:\...)
- Extended length path (\\?\...)
- UNC path (\\...)

Also relative path is properly converted to extended length path (\\?\...)
Now WIN32 CLI supports a file which ends with '.'.  E.g. "my_file."
test_trailing_period.bat tests it.
@t-mat t-mat changed the title Support Win32 long path Draft: Support Win32 long path Aug 22, 2025
@t-mat
Copy link
Copy Markdown
Contributor Author

t-mat commented Aug 22, 2025

WIP. But now this PR supports long-path, UNC path and trailing period. e.g. my_trailing_period_file.
tests/windows/00-test-all.bat builds and tests all.

TODO:

  • Ask Cyan: We can allow to use pathcch.lib to simplifiy the code. Though it's only supported Windows8+ or Windows Server 2012+.
  • Fix CMakefile.txt for mingw which doesn't support #pragma comment(lib)

t-mat added 4 commits August 23, 2025 08:35
To keep compatibility, now CMakeFile.txt detects MSVC + Win10+ automatically.
If user want to use MinGW, set XXHASH_WIN_TARGET_WIN10 to cmake.
They set XXHSUM_WIN32_LONGPATH for C preprocessor.
@t-mat
Copy link
Copy Markdown
Contributor Author

t-mat commented Aug 23, 2025

Added cmake option XXHASH_WIN_TARGET_WIN10 for MinGW / cross-compile environment.

  • cmake + MSVC user doesn't have to care about this option.
  • MinGW user who need long-path support, they should pass XXHASH_WIN_TARGET_WIN10 to cmake.

When cmake uses MSVC, it automatically host environment is Windows 10+ or not.
But in the MinGW/MSYS environment, I don't know how to detect it.
Therefore, I added expcilit option for it.

This switch enables long-path code via preprocessor macro XXHSUM_WIN32_LONGPATH.

@t-mat
Copy link
Copy Markdown
Contributor Author

t-mat commented Aug 23, 2025

PATHCCH_DO_NOT_NORMALIZE_SEGMENTS (support trailing period and space) and PATHCCH_ENSURE_IS_EXTENDED_LENGTH_PATH (ensure \\?\ prefix) are implemented since Windows 10 RS2 1703 as operation system DLL and its SDK. Unfortunately, earlier version of Windows just ignore these flags.

Therefore, its result depends on Win10 version. (xxhsum my_file. works fine with Win10RS2>, but earlier versions of Win10 reports error "file not found")

So we need to check runtime Windows version for it. But it may be over-engineering for this small utility. Possible solutions are:

  1. Re-imprement them by hand. Actually, reactos have them. It also provides wider compatibility as a bonus. (It works with all NTFS Windows).
  2. Detect runtime Windows version and use proper flag and logic.
  3. Fallback to manifest as @29039 suggested.
  4. Just implement it and say "It's Microsoft spec, not our fault 😏 ".

@Cyan4973
Copy link
Copy Markdown
Owner

Cyan4973 commented Sep 3, 2025

Sorry, just noticed this question:

Ask Cyan: We can allow to use pathcch.lib to simplifiy the code. Though it's only supported Windows8+ or Windows Server 2012+.

I think that's reasonable

@t-mat
Copy link
Copy Markdown
Contributor Author

t-mat commented Sep 8, 2025

@Cyan4973 , thanks. I forgot to add @ to the question 😅

Copy link
Copy Markdown
Owner

@Cyan4973 Cyan4973 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the extensive test suite delivered with this patch.
Great PR @t-mat !

@Cyan4973
Copy link
Copy Markdown
Owner

Cyan4973 commented Sep 11, 2025

This PR is still tagged as "Draft".
But at a quick glance, it already looks pretty clean.

So at which stage is it ?
Still some work to do ? Or ready for review and merge ?

cc @t-mat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

xxhsum.exe Long MAX_PATH support over 260 characters in the path

3 participants