PT-2377 - fixed pt-table-sync for JSON utf8 strings#861
PT-2377 - fixed pt-table-sync for JSON utf8 strings#861svetasmirnova merged 1 commit intopercona:3.xfrom hpoettker:PT-2377_table_sync_with_utf8_json
Conversation
|
I updated the PR such that the change is also included in |
|
I adjusted the added code to also handle The unit test for |
|
Thanks so much for the massive effort to adjust the code base to MySQL 8.4! I've rebased the changes on the latest commit in 3.x. |
svetasmirnova
left a comment
There was a problem hiding this comment.
Test t/pt-table-sync/pt-2377.t does not pass with 5.7. Please add skip call into this test that will skip test if run on 5.7 or earlier version.
There is also a note at https://perldoc.perl.org/utf8:
$flag = utf8::is_utf8($string)
(Since Perl 5.8.1) Test whether $string is marked internally as encoded in UTF-8. Functionally the same as Encode::is_utf8($string).
Typically only necessary for debugging and testing, if you need to dump the internals of an SV, Devel::Peek's Dump() provides more detail in a compact form.
If you still think you need this outside of debugging, testing or dealing with filenames, you should probably read perlunitut and "What is "the UTF8 flag"?" in perlunifaq.
Don't use this flag as a marker to distinguish character and binary data: that should be decided for each variable when you write your code.
To force unicode semantics in code portable to perl 5.8 and 5.10, call utf8::upgrade($string) unconditionally.
Is there any other way to perform the same check?
The MySQL driver DBD::mysql does not decode JSON values as utf8 although MySQL uses utf8mb4 for all JSON strings. This change decodes JSON values as utf8 (when not already done) such that SQL statements are generated correctly.
|
I've removed the check, added a skip for versions below 8.0 in If I understand correctly, the check There certainly is a bug in |
The PR is intended to resolve this issue: https://perconadev.atlassian.net/browse/PT-2377
The tests added with
pt-2377.tboth fail with the current code base as they do not generate the expected DML statements with non-ASCII characters correctly. With the proposed change, they run successfully. They test that bothREPLACEandUPDATEstatements are generated correctly bypt-table-syncwhen applied to tables with JSON columns that contain non-ASCII characters.The code is my own creation and it can be distributed under the GPL2 licence.