[Bug] MySQL CDC connector gets unexpected bytes if database encoding mismatched #1972
-
Looking for helps with MySQL encoding issues. Any comments are appreciated! We have a legacy MySQL 5.7 database with latin1 encoding but used as an utf-8 one (MySQL stores the original utf-8 bytes, so it works anyway). However, this would not work with Flink CDC somehow. Flink CDC returns corrupted strings when reading these wrongly encoded values. I compared the bytes to find that Flink CDC reads different bytes from the ones read by JDBC with the same parameters (JDBC returns the exact same bytes as stored in MySQL). The DDLs looks like below:
Environment:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
The issue you are experiencing could be due to a character encoding mismatch between MySQL and Flink CDC. MySQL 5.7 defaults to using the "latin1" character set for text fields if a different character set is not specified explicitly. However, the Flink CDC connector is configured to use the "UTF-8" character set for encoding/decoding text fields. One potential solution to this issue is to change the character set of the MySQL table to "UTF-8" explicitly using ALTER TABLE statement. For example, you can use the following statement to change the character set of the "mysql_table" table to "UTF-8": After making this change, you should also update the Flink CDC table definition to use the same character set as follows: |
Beta Was this translation helpful? Give feedback.
The issue you are experiencing could be due to a character encoding mismatch between MySQL and Flink CDC. MySQL 5.7 defaults to using the "latin1" character set for text fields if a different character set is not specified explicitly. However, the Flink CDC connector is configured to use the "UTF-8" character set for encoding/decoding text fields.
One potential solution to this issue is to change the character set of the MySQL table to "UTF-8" explicitly using ALTER TABLE statement. For example, you can use the following statement to change the character set of the "mysql_table" table to "UTF-8":
ALTER TABLE mysql_table CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
This will upda…