- 
                Notifications
    You must be signed in to change notification settings 
- Fork 8k
Support musl gettext (libintl) functions in tests #17168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @arnaud-lb the Alpine CI should run without GNU gettext installed after this | 
        
          
                ext/gettext/gettext.c
              
                Outdated
          
        
      | * say that MUSL_LOCPATH will be ignored in the | ||
| * scenarios where secure_getenv() will return | ||
| * NULL rather than the value of the variable. */ | ||
| musl_locpath = secure_getenv("MUSL_LOCPATH"); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need to be guarded for MacOS and Windows to build, ideally move the decls for btd_result and musl_locpath into this guard to prevent unused warnings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a ./configure check for secure_getenv and the corresponding #if. It won't matter on musl, but now a fallback to getenv() is there in case in case someone write a new libc with this bug that is also missing secure_getenv().
| Most failures in #13925 were caused by too much reliance on Glibc behavior. In most cases Musl was not at fault, was actually Posix compliant, and the fix improved code quality. In this case however, Musl is diverging from the standard. Can some of these issues be fixed in Musl instead? This would be better, IMHO. Otherwise, can we update the Alpine package so that it depends on  | 
| 
 It's only in one of the three bullet points that musl is obviously wrong, when  The additional  
 I think if you want to go this route, it would be better to have  | 
| FWIW the only test that fails if we ignore  var_dump(is_string(bindtextdomain('foo', null)));in  | 
| 
 Right. My first impression was that   
 You are right that we need to handle the  Regarding  | 
        
          
                ext/gettext/config.m4
              
                Outdated
          
        
      | dnl If libintl.h is provided by libc, it's possible that libc is musl. | ||
| dnl The gettext family of functions under musl ignores the codeset | ||
| dnl suffix on directories like "en_US.UTF-8"; instead they look only | ||
| dnl in "en_US". To accomodate that, we copy some test cases from one | ||
| dnl to the other. | ||
| AC_MSG_NOTICE([copying en_US.UTF-8 messages to en_US in case you are on musl]) | ||
| mkdir -p "${srcdir%/}"/ext/gettext/tests/locale/en_US | ||
| cp -r "${srcdir%/}"/ext/gettext/tests/locale/en_US.UTF-8/* "${srcdir%/}"/ext/gettext/tests/locale/en_US/ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possible issue with this, is that new files in en_US.UTF-8/ will not be reflected to en_US/ until ./configure is executed again. This may be confusing during test development.
An alternative would be to move this to a Makefile rule, but I'm not sure that we can efficiently trigger the copy only if the contents of en_US.UTF-8/ changed.
An other one would be to symlink en_US/ to en_US.UTF-8/. I don't know how well symlinks are supported on various systems, but since the en_US/ path will be used only on Musl this might be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Easiest of all would be to symlink them in the source tree and commit the symlink, but I don't have any Windows machines around to test on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An other alternative would to be create the symlink in phpt files directly, and only on Musl. This way we avoid committing a symlink as well as potential Windows issues. We can create a .inc file in ext/opcache/tests that we include in all relevant tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the latest force-push, I changed the cp -r to an ln -s. That should avoid issues with newly-added test files, and is probably safe: the ln -s is only performed if your libc provides gettext(), and I doubt that will be true on any system where symlinks don't work.
| 
 When no binding is found, I think it falls back to the current locale, and the current locale is mapped to a file under  I'll rewrite this later to just return false on musl without trying to fix the  | 
| This is a bugfix, please target 8.3. | 
394fe12    to
    4334502      
    Compare
  
    | I just force-pushed a new version that returns  $expected = [true, true, false, "UTF-8", "UTF-8"];
$expected_musl = [false, true, false, false, false];so that one  | 
| 
 You want me to change the target branch? (I guess you forward-port bug fixes from older branches rather than back-porting them from master?) | 
| 
 Yes and yes | 
According to POSIX, bindtextdomain() returns "the implementation-
defined default directory pathname used by the gettext family of
functions" when its second parameter is NULL (i.e. when you are
querying the directory corresponding to some text domain and that
directory has not yet been set). Its PHP counterpart is feeding
that result direclty to RETURN_STRING, but this can go wrong in
two ways:
  1. If an error occurs, even POSIX-compliant implementations
     may return NULL.
  2. At least one non-compliant implementation (musl) lacks
     a default directory and returns NULL whenever the domain
     has not yet been bound.
In either of those cases, PHP segfaults on the NULL string. In this
commit we check for the NULL, and RETURN_FALSE when it happens rather
than crashing.
This partially addresses GH php#13696
    Musl has two quirks that are leading to failed internationalization tests. First is that the return value of bindtextdomain(..., NULL) will always be false, rather than an "implementation-defined default directory," because musl does not have an implementation-defined default directory. One test needs a special case for this. Second is that the musl implementation of bind_textdomain_codeset() always returns NULL. The POSIX-correctness of this is debatable, but it is roughly equivalent to correct, because musl only support UTF-8, so the NULL value indicating that the codeset is unchanged from the locale's codeset (UTF-8) is accurate. PHP's bind_textdomain_codeset() function however treats NULL as failure, unconditionally: * php/doc-en#4311 * php#17163 This unfortunately causes false to be returned consistently on musl -- even when nothing unexpected has happened -- and naturally this is affecting several tests. For now we change two tests to accept "false" in addition to "UTF-8" so that they may pass on musl. If PHP's bind_textdomain_codeset() is updated to differentiate between NULL and NULL-with-errno-set, these tests can also be updated once again to reject the NULL-with-errno result. This partially addresses GH php#13696
The gettext() family of functions under musl does not support codeset
suffixes like ".UTF-8", because the only codeset it understands is
UTF-8. (Yes, it is annoying that it doesn't support the suffix for the
codeset that it does understand; no, I am not in charge.) Thanks to
this, we have six failing tests on musl,
  * FAIL Gettext basic test with en_US locale that should be on nearly
    every system
    [ext/gettext/tests/gettext_basic-enus.phpt]
  * FAIL Test if bindtextdomain() returns string id if no directory path
    is set( if directory path is 'null')
    [ext/gettext/tests/gettext_bindtextdomain-cwd.phpt]
  * FAIL Test dcgettext() functionality
    [ext/gettext/tests/gettext_dcgettext.phpt]
  * FAIL Test dgettext() functionality
    [ext/gettext/tests/gettext_dgettext.phpt]
  * FAIL Test if dngettext() returns the correct translations
    (optionally plural).
    [ext/gettext/tests/gettext_dngettext-plural.phpt]
  * FAIL Test ngettext() functionality
    [ext/gettext/tests/gettext_ngettext.phpt]
These are all fixed by symlinking the en_US.UTF-8 message data to en_US,
where musl is able to find it.
This does not make the situation any better for developers (who don't
know what libc their users will be running), but that problem is
inhereted from C and is not the fault of the gettext extension.
This partially addresses GH php#13696
    If gettext() et al. come from musl libc, ./configure will symlink ext/gettext/tests/locale/en_US.UTF-8 to en_US (without the ".UTF-8" suffix). We want to ignore the symlink.
4334502    to
    385a91f      
    Compare
  
    | 
 Done. There are merge conflicts porting back to master, but nothing serious. | 
* PHP-8.3: NEWS for GH-17168 ext/gettext/config.m4: symlink en_US.UTF-8 test bits to en_US for musl ext/gettext/tests: fix libintl return values under musl ext/gettext/gettext.c: handle NULLs from bindtextdomain()
* PHP-8.4: NEWS for GH-17168 ext/gettext/config.m4: symlink en_US.UTF-8 test bits to en_US for musl ext/gettext/tests: fix libintl return values under musl ext/gettext/gettext.c: handle NULLs from bindtextdomain()
Work around several musl quirks to get the test suite passing:
bindtextdomain(..., NULL)returnsNULLrather than a default path if no binding existsbind_textdomain_codeset(..., NULL)always returnsNULLen_USworks, and noten_US.UTF-8.The commit messages go into more detail about each workaround. Copying the message data from
en_US.UTF-8toen_USin./configureis pretty hacky but it gets the job done (suggestions for improvement are welcome).Refs: