Skip to content

mb_{detect/convert}_encoding. Again. Or maybe pdo? #17931

@gzhegow1991

Description

@gzhegow1991

Description

(For all tests online tool is used: https://onlinephp.io/)

The following code:

<?php

// > this is \PDOException message in Russian, that represents `Server is not responding` (it means - server configuration is not a solution)
$str = base64_decode('U1FMU1RBVEVbSFkwMDBdIFsyMDAyXSDP7uTq6/735e3o5SDt5SDz8fLg7e7i6+Xt7iwg8i7qLiDq7u3l9+376SDq7uzv/P7y5fAg7vLi5fDjIOfg7/Du8SDt4CDv7uTq6/735e3o5S4NCiAoU1FMOiBTRVQgRk9SRUlHTl9LRVlfQ0hFQ0tTPTA7KQ==');

// > and it looks like
// ###
// Warning: Your output contains characters that could not be displayed. Make sure you encode the output when working with special characters or binary data. [Click here for an example on how to do this](https://onlinephp.io/code/utf8-in-the-sandbox)
// SQLSTATE[HY000] [2002] ����������� �� �����������, �.�. �������� ��������� ������ ������ �� �����������.
 (SQL: SET FOREIGN_KEY_CHECKS=0;)
// ###


$mbListEncodings = mb_list_encodings();


$detect = mb_detect_encoding($str);

// PHP_VERSION_ID < 80300 -> 'UTF-8'
// PHP_VERSION_ID >= 80300 -> 'ASCII'
var_dump($detect);


$detect2 = mb_detect_encoding($str, $mbListEncodings, true);
// $detect2 = mb_detect_encoding($str, $mbListEncodings); // > same result, actually isnt, without $strict = true, it may return 'ASCII' if provided below results is not an option, with $strict it returns FALSE then

// PHP_VERSION_ID < 80100 -> 'ISO-8859-1'
// PHP_VERSION_ID >= 80100 -> 'Windows-1252'
var_dump($detect2);


// > accidentally IT WORKS HERE but PHP_VERSION_ID >= 80100
array_unshift($mbListEncodings, 'CP1251');
array_unshift($mbListEncodings, 'Windows-1251');
$detect3 = mb_detect_encoding($str, $mbListEncodings, true);

// PHP_VERSION_ID < 80100 -> 'ISO-8859-1' // > !!! seems as old bug
// PHP_VERSION_ID >= 80100 -> 'Windows-1251'
var_dump($detect3);


$cpDetectedWrong = 'Windows-1252';

$converted = mb_convert_encoding($str, 'UTF-8', $cpDetectedWrong);
$converted_b64 = base64_encode($converted);

var_dump($converted); // string(207) "SQLSTATE[HY000] [2002] Ïîäêëþ÷åíèå íå óñòàíîâëåíî, ò.ê. êîíå÷íûé êîìïüþòåð îòâåðã çàïðîñ íà ïîäêëþ÷åíèå.
 (SQL: SET FOREIGN_KEY_CHECKS=0;)"

var_dump($converted_b64); // string(276) "U1FMU1RBVEVbSFkwMDBdIFsyMDAyXSDDj8Ouw6TDqsOrw77Dt8Olw63DqMOlIMOtw6Ugw7PDscOyw6DDrcOuw6LDq8Olw63Driwgw7Iuw6ouIMOqw67DrcOlw7fDrcO7w6kgw6rDrsOsw6/DvMO+w7LDpcOwIMOuw7LDosOlw7DDoyDDp8Ogw6/DsMOuw7Egw63DoCDDr8Ouw6TDqsOrw77Dt8Olw63DqMOlLg0KIChTUUw6IFNFVCBGT1JFSUdOX0tFWV9DSEVDS1M9MDsp"

But I expected this output instead:

<?php

$detect = mb_detect_encoding($str, mb_list_encodings(), true);
var_dump($detect); // 'Windows-1251'

$cpDectectedCorrect = 'Windows-1251';

$converted = mb_convert_encoding($str, 'UTF-8', $cpDectectedCorrect);
$converted_b64 = base64_encode($converted);

var_dump($converted); // string(207) "SQLSTATE[HY000] [2002] Подключение не установлено, т.к. конечный компьютер отверг запрос на подключение.
 (SQL: SET FOREIGN_KEY_CHECKS=0;)"

var_dump($converted_b64); // string(276) "U1FMU1RBVEVbSFkwMDBdIFsyMDAyXSDQn9C+0LTQutC70Y7Rh9C10L3QuNC1INC90LUg0YPRgdGC0LDQvdC+0LLQu9C10L3Qviwg0YIu0LouINC60L7QvdC10YfQvdGL0Lkg0LrQvtC80L/RjNGO0YLQtdGAINC+0YLQstC10YDQsyDQt9Cw0L/RgNC+0YEg0L3QsCDQv9C+0LTQutC70Y7Rh9C10L3QuNC1Lg0KIChTUUw6IFNFVCBGT1JFSUdOX0tFWV9DSEVDS1M9MDsp"

I've tried using mb_check_encoding()... I've played for few hours with mb_detect_order(), mb_list_encodings()... I've even tried to split known encodings by groups by first letters or their slugs and apply mb_convert_encoding for better detection for each group.

No. Just dont work, and should be fixed like

<?php
set_exception_handler(function ($e) {
   $phpMessage = $e->getMessage();

   if ($e instanceof \PDOException) {
     $isUtf8 = preg_match('//u', $phpMessage) === 1;
     if (! $isUtf8) {
       $isWindows = (strtoupper(substr(PHP_OS, 0, 3)) === 'WIN');
       if ($isWindows) {
         $phpMessage = mb_convert_encoding($phpMessage, 'UTF-8', 'CP1251');
       }
     }
   }

   /// ...code
});

PHP Version

PHP 8.4

Operating System

Windows 10

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions