Skip to content

Commit aee5cd9

Browse files
devanshu0987Devanshu
andauthored
fix(functions): Make translate function postgres compatible (#19630)
## Which issue does this PR close? - Closes #. ## Rationale for this change In Postgres `translate` function implementation, the duplicates in the `from` argument are ignored, and the first occurrence wins. A similar implementation is also present in DuckDB. If the character already exists, the index/mapping is not updated. https://github.com/duckdb/duckdb/blob/4b7a6b7bd0f8c968bfecab08e801cdc1f0a5cdfd/extension/core_functions/scalar/string/translate.cpp#L45 Before the change in DataFusion ``` > SELECT translate('abcabc', 'aa', 'de'); +-------------------------------------------------+ | translate(Utf8("abcabc"),Utf8("aa"),Utf8("de")) | +-------------------------------------------------+ | ebcebc | +-------------------------------------------------+ 1 row(s) fetched. Elapsed 0.001 seconds. ``` While DuckDB returns ``` D SELECT translate('abcabc', 'aa', 'de'); ┌─────────────────────────────────┐ │ translate('abcabc', 'aa', 'de') │ │ varchar │ ├─────────────────────────────────┤ │ dbcdbc │ └─────────────────────────────────┘ ``` Postgres returns ``` SELECT translate('abcabc', 'aa', 'de') Output: translate ----------- dbcdbc (1 row) ``` ## What changes are included in this PR? - If there are duplicate characters present in `from`, the first occurrence wins. - SLT Tests update. The earlier tests `from` value was `foo` which contained duplicates and was mapping `o` -> `r` instead of Postgress compatible `o` -> `a` ## Are these changes tested? - New Unit Tests are added, which test this behaviour. ## Are there any user-facing changes? This is a contract change in some sense. If someone has taken dependency on this behaviour, they will encounter a change. However, I am unsure and need help to properly categorize this. --------- Co-authored-by: Devanshu <[email protected]>
1 parent ff38480 commit aee5cd9

File tree

2 files changed

+19
-6
lines changed

2 files changed

+19
-6
lines changed

datafusion/functions/src/unicode/translate.rs

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -170,7 +170,8 @@ where
170170
// Build from_map using reusable buffer
171171
from_graphemes.extend(from.graphemes(true));
172172
for (index, c) in from_graphemes.iter().enumerate() {
173-
from_map.insert(*c, index);
173+
// Ignore characters that already exist in from_map, else insert
174+
from_map.entry(*c).or_insert(index);
174175
}
175176

176177
// Build to_graphemes
@@ -259,6 +260,18 @@ mod tests {
259260
Utf8,
260261
StringArray
261262
);
263+
test_function!(
264+
TranslateFunc::new(),
265+
vec![
266+
ColumnarValue::Scalar(ScalarValue::from("abcabc")),
267+
ColumnarValue::Scalar(ScalarValue::from("aa")),
268+
ColumnarValue::Scalar(ScalarValue::from("de"))
269+
],
270+
Ok(Some("dbcdbc")),
271+
&str,
272+
Utf8,
273+
StringArray
274+
);
262275
test_function!(
263276
TranslateFunc::new(),
264277
vec![

datafusion/sqllogictest/test_files/string/string_query.slt.part

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -528,7 +528,7 @@ FROM test_basic_operator;
528528
Andrew
529529
Xiangpeng
530530
Raphael
531-
under_scrre
531+
under_scare
532532
percent
533533
(empty)
534534
(empty)
@@ -542,10 +542,10 @@ SELECT
542542
TRANSLATE(unicode_1, 'foo', 'bar') as c
543543
FROM test_basic_operator;
544544
----
545-
databusirn📊🔥
546-
databusirn数据融合
547-
databusirnДатаФусион
548-
un iść crre
545+
databusian📊🔥
546+
databusian数据融合
547+
databusianДатаФусион
548+
un iść care
549549
pan Tadeusz ma iść w kąt
550550
(empty)
551551
(empty)

0 commit comments

Comments
 (0)