Skip to content

Commit 2c4cfc3

Browse files
committed
Fix UTF-8 expansion/truncation error during fetch
On IBM i, we bind character strings to UTF-8, which breaks the 1 character = 1 code unit assumption of the code. LUW uses UTF-16 instead, which AFAICT doesn't have this problem (since I can't find any encoding which maps from fewer than 4 bytes to a Unicode code point above U+FFFF). This results in a buffer that may be too small to hold the entire result and the result is truncated - however the indicator value is set to the length of the total data. To make problems worse, the code assumes that the indicator value is less than the size of the buffer and reads the length given. In the case of truncation, this assumption is incorrect and the result is a buffer over-read, and attempting to decode random bytes as UTF-8. If those bytes are not valid, this can cause an error similar to the following: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 22: invalid start byte The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/QOpenSys/pkgs/lib/python3.6/site-packages/ibm_db_dbi.py", line 1472, in _fetch_helper row = ibm_db.fetch_tuple(self.stmt_handler) SystemError: <built-in function fetch_tuple> returned a result with an error set During handling of the above exception, another exception occurred: Traceback (most recent call last): File "example.py", line 14, in <module> cur.fetchone() File "/QOpenSys/pkgs/lib/python3.6/site-packages/ibm_db_dbi.py", line 1492, in fetchone row_list = self._fetch_helper(1) File "/QOpenSys/pkgs/lib/python3.6/site-packages/ibm_db_dbi.py", line 1476, in _fetch_helper raise self.messages[-1] ibm_db_dbi.Error: ibm_db_dbi::Error: SystemError('<built-in function fetch_tuple> returned a result with an error set',)
1 parent 8bc7086 commit 2c4cfc3

File tree

1 file changed

+20
-1
lines changed

1 file changed

+20
-1
lines changed

IBM_DB/ibm_db/ibm_db.c

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1056,7 +1056,26 @@ static int _python_ibm_db_bind_column_helper(stmt_handle *stmt_res)
10561056
case SQL_GRAPHIC:
10571057
case SQL_VARGRAPHIC:
10581058
case SQL_LONGVARGRAPHIC:
1059+
#ifndef __PASE__
1060+
// Assume that no matter the source encoding, a
1061+
// character encoded in fewer than 4 bytes will map to
1062+
// a Unicode code point below U+10000 and thus maps to
1063+
// 2-bytes in UTF-16. A source character encoded in
1064+
// 4 bytes may map to a Unicode code point above U+FFFF,
1065+
// leading to a UTF-16 surrogate pair, but this would
1066+
// not mean any expansion.
10591067
in_length = stmt_res->column_info[i].size+1;
1068+
#else
1069+
// Assume the worst-case of 1 byte in the source
1070+
// encoding maps to 4-bytes encoded in UTF-8.
1071+
//
1072+
// NOTE: We could do some heuristics to limit the amount
1073+
// of memory we allocate, but the maximum record length
1074+
// is 32KiB, so the max we could allocate for all
1075+
// columns would not exceed 128KiB, which is tiny and
1076+
// not worth bothering with.
1077+
in_length = stmt_res->column_info[i].size*4 + 1;
1078+
#endif
10601079
row_data->w_val = (SQLTCHAR *) ALLOC_N(SQLTCHAR, in_length);
10611080
rc = SQLBindCol((SQLHSTMT)stmt_res->hstmt, (SQLUSMALLINT)(i+1),
10621081
SQL_C_TCHAR, row_data->w_val, in_length * sizeof(SQLTCHAR),
@@ -8357,7 +8376,7 @@ static PyObject *_python_ibm_db_bind_fetch_helper(PyObject *args, int op)
83578376
case SQL_VARGRAPHIC:
83588377
case SQL_LONGVARGRAPHIC:
83598378
tmp_length = stmt_res->column_info[column_number].size;
8360-
value = getSQLTCharAsPyUnicodeObject(row_data->w_val, out_length);
8379+
value = getSQLTCharAsPyUnicodeObject(row_data->w_val, SQL_NTS);
83618380
break;
83628381

83638382
case SQL_LONGVARCHAR:

0 commit comments

Comments
 (0)