Skip to content

Conversation

@enh-google
Copy link
Contributor

POSIX says "If no such wide-character code is found, the current token extends to the end of the wide-character string pointed to by ws1, and subsequent searches for a token shall return a null pointer", but the current implementation only returns nullptr the first time. This failed an existing bionic test when I tried to switch over to llvm-libc wcstok().

You're allowed to keep calling wcstok() after the first nullptr, and should keep getting nullptr.
@llvmbot llvmbot added the libc label Jul 31, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 31, 2025

@llvm/pr-subscribers-libc

Author: None (enh-google)

Changes

POSIX says "If no such wide-character code is found, the current token extends to the end of the wide-character string pointed to by ws1, and subsequent searches for a token shall return a null pointer", but the current implementation only returns nullptr the first time. This failed an existing bionic test when I tried to switch over to llvm-libc wcstok().


Full diff: https://github.com/llvm/llvm-project/pull/151589.diff

2 Files Affected:

  • (modified) libc/src/wchar/wcstok.cpp (+9-4)
  • (modified) libc/test/src/wchar/wcstok_test.cpp (+8)
diff --git a/libc/src/wchar/wcstok.cpp b/libc/src/wchar/wcstok.cpp
index ed4f0aad08ea5..85513a6ecfb93 100644
--- a/libc/src/wchar/wcstok.cpp
+++ b/libc/src/wchar/wcstok.cpp
@@ -27,17 +27,22 @@ LLVM_LIBC_FUNCTION(wchar_t *, wcstok,
   wchar_t *tok_start = str;
   while (*tok_start != L'\0' && internal::wcschr(delims, *tok_start))
     ++tok_start;
+  if (*tok_start == L'\0') {
+    *context = nullptr;
+    return nullptr;
+  }
 
   wchar_t *tok_end = tok_start;
   while (*tok_end != L'\0' && !internal::wcschr(delims, *tok_end))
     ++tok_end;
 
-  if (*tok_end != L'\0') {
+  if (*tok_end == L'\0') {
+    *context = nullptr;
+  } else {
     *tok_end = L'\0';
-    ++tok_end;
+    *context = tok_end + 1;
   }
-  *context = tok_end;
-  return *tok_start == L'\0' ? nullptr : tok_start;
+  return tok_start;
 }
 
 } // namespace LIBC_NAMESPACE_DECL
diff --git a/libc/test/src/wchar/wcstok_test.cpp b/libc/test/src/wchar/wcstok_test.cpp
index 7106e9f2fab5e..3bb1014aff3ab 100644
--- a/libc/test/src/wchar/wcstok_test.cpp
+++ b/libc/test/src/wchar/wcstok_test.cpp
@@ -19,6 +19,8 @@ TEST(LlvmLibcWCSTokReentrantTest, NoTokenFound) {
     // Another call to ensure that 'reserve' is not in a bad state.
     ASSERT_EQ(LIBC_NAMESPACE::wcstok(empty, L"", &reserve), nullptr);
     ASSERT_EQ(LIBC_NAMESPACE::wcstok(nullptr, L"", &reserve), nullptr);
+    // Subsequent searches still return nullptr.
+    ASSERT_EQ(LIBC_NAMESPACE::wcstok(nullptr, L"", &reserve), nullptr);
   }
   { // Empty source and single character delimiter string.
     wchar_t empty[] = L"";
@@ -27,6 +29,8 @@ TEST(LlvmLibcWCSTokReentrantTest, NoTokenFound) {
     // Another call to ensure that 'reserve' is not in a bad state.
     ASSERT_EQ(LIBC_NAMESPACE::wcstok(empty, L"_", &reserve), nullptr);
     ASSERT_EQ(LIBC_NAMESPACE::wcstok(nullptr, L"_", &reserve), nullptr);
+    // Subsequent searches still return nullptr.
+    ASSERT_EQ(LIBC_NAMESPACE::wcstok(nullptr, L"_", &reserve), nullptr);
   }
   { // Same character source and delimiter string.
     wchar_t single[] = L"_";
@@ -35,6 +39,8 @@ TEST(LlvmLibcWCSTokReentrantTest, NoTokenFound) {
     // Another call to ensure that 'reserve' is not in a bad state.
     ASSERT_EQ(LIBC_NAMESPACE::wcstok(single, L"_", &reserve), nullptr);
     ASSERT_EQ(LIBC_NAMESPACE::wcstok(nullptr, L"_", &reserve), nullptr);
+    // Subsequent searches still return nullptr.
+    ASSERT_EQ(LIBC_NAMESPACE::wcstok(nullptr, L"_", &reserve), nullptr);
   }
   { // Multiple character source and single character delimiter string.
     wchar_t multiple[] = L"1,2";
@@ -51,6 +57,8 @@ TEST(LlvmLibcWCSTokReentrantTest, NoTokenFound) {
     ASSERT_TRUE(tok[2] == L'2');
     ASSERT_TRUE(tok[3] == L'\0');
     ASSERT_EQ(LIBC_NAMESPACE::wcstok(nullptr, L":", &reserve), nullptr);
+    // Subsequent searches still return nullptr.
+    ASSERT_EQ(LIBC_NAMESPACE::wcstok(nullptr, L":", &reserve), nullptr);
   }
 }
 

Copy link
Contributor

@michaelrj-google michaelrj-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@enh-google enh-google merged commit b36f05c into llvm:main Aug 1, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants