Skip to content

Commit 0768fad

Browse files
wangyumgatorsmile
authored andcommitted
[SPARK-28126][SQL] Support TRIM(trimStr FROM str) syntax
## What changes were proposed in this pull request? [PostgreSQL](https://github.com/postgres/postgres/blob/7c850320d8cfa5503ecec61c2559661b924f7595/src/test/regress/sql/strings.sql#L624) support another trim pattern: `TRIM(trimStr FROM str)`: Function | Return Type | Description | Example | Result --- | --- | --- | --- | --- trim([leading \| trailing \| both] [characters] from string) | text | Remove the longest string containing only characters from characters (a space by default) from the start, end, or both ends (both is the default) of string | trim(both 'xyz' from 'yxTomxx') | Tom This pr add support this trim pattern. After this pr. We can support all standard syntax except `TRIM(FROM str)` because it conflicts with our Literals: ```sql Literals of type 'FROM' are currently not supported.(line 1, pos 12) == SQL == SELECT TRIM(FROM ' SPARK SQL ') ``` PostgreSQL, Vertica and MySQL support this pattern. Teradata, Oracle, DB2, SQL Server, Hive and Presto **PostgreSQL**: ``` postgres=# SELECT substr(version(), 0, 16), trim('xyz' FROM 'yxTomxx'); substr | btrim -----------------+------- PostgreSQL 11.3 | Tom (1 row) ``` **Vertica**: ``` dbadmin=> SELECT version(), trim('xyz' FROM 'yxTomxx'); version | btrim ------------------------------------+------- Vertica Analytic Database v9.1.1-0 | Tom (1 row) ``` **MySQL**: ``` mysql> SELECT version(), trim('xyz' FROM 'yxTomxx'); +-----------+----------------------------+ | version() | trim('xyz' FROM 'yxTomxx') | +-----------+----------------------------+ | 5.7.26 | yxTomxx | +-----------+----------------------------+ 1 row in set (0.00 sec) ``` More details: https://www.postgresql.org/docs/11/functions-string.html ## How was this patch tested? unit tests Closes apache#24924 from wangyum/SPARK-28075-2. Authored-by: Yuming Wang <[email protected]> Signed-off-by: gatorsmile <[email protected]>
1 parent 870f972 commit 0768fad

File tree

6 files changed

+24
-13
lines changed

6 files changed

+24
-13
lines changed

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -703,7 +703,7 @@ primaryExpression
703703
| EXTRACT '(' field=identifier FROM source=valueExpression ')' #extract
704704
| (SUBSTR | SUBSTRING) '(' str=valueExpression (FROM | ',') pos=valueExpression
705705
((FOR | ',') len=valueExpression)? ')' #substring
706-
| TRIM '(' trimOption=(BOTH | LEADING | TRAILING) (trimStr=valueExpression)?
706+
| TRIM '(' trimOption=(BOTH | LEADING | TRAILING)? (trimStr=valueExpression)?
707707
FROM srcStr=valueExpression ')' #trim
708708
;
709709

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -611,11 +611,15 @@ object StringTrim {
611611
612612
_FUNC_(TRAILING FROM str) - Removes the trailing space characters from `str`.
613613
614-
_FUNC_(BOTH trimStr FROM str) - Remove the leading and trailing `trimStr` characters from `str`
614+
_FUNC_(trimStr, str) - Remove the leading and trailing `trimStr` characters from `str`.
615615
616-
_FUNC_(LEADING trimStr FROM str) - Remove the leading `trimStr` characters from `str`
616+
_FUNC_(trimStr FROM str) - Remove the leading and trailing `trimStr` characters from `str`.
617617
618-
_FUNC_(TRAILING trimStr FROM str) - Remove the trailing `trimStr` characters from `str`
618+
_FUNC_(BOTH trimStr FROM str) - Remove the leading and trailing `trimStr` characters from `str`.
619+
620+
_FUNC_(LEADING trimStr FROM str) - Remove the leading `trimStr` characters from `str`.
621+
622+
_FUNC_(TRAILING trimStr FROM str) - Remove the trailing `trimStr` characters from `str`.
619623
""",
620624
arguments = """
621625
Arguments:
@@ -640,6 +644,8 @@ object StringTrim {
640644
SparkSQL
641645
> SELECT _FUNC_('SL', 'SSparkSQLS');
642646
parkSQ
647+
> SELECT _FUNC_('SL' FROM ' SparkSQL ');
648+
parkSQ
643649
> SELECT _FUNC_(BOTH 'SL' FROM 'SSparkSQLS');
644650
parkSQ
645651
> SELECT _FUNC_(LEADING 'SL' FROM 'SSparkSQLS');

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1408,7 +1408,7 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
14081408
override def visitTrim(ctx: TrimContext): Expression = withOrigin(ctx) {
14091409
val srcStr = expression(ctx.srcStr)
14101410
val trimStr = Option(ctx.trimStr).map(expression)
1411-
ctx.trimOption.getType match {
1411+
Option(ctx.trimOption).map(_.getType).getOrElse(SqlBaseParser.BOTH) match {
14121412
case SqlBaseParser.BOTH =>
14131413
StringTrim(srcStr, trimStr)
14141414
case SqlBaseParser.LEADING =>

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -737,6 +737,11 @@ class PlanParserSuite extends AnalysisTest {
737737
"SELECT TRIM(TRAILING FROM ' bunch o blanks ')",
738738
StringTrimRight(Literal(" bunch o blanks "), None)
739739
)
740+
741+
assertTrimPlans(
742+
"SELECT TRIM('xyz' FROM 'yxTomxx')",
743+
StringTrim(Literal("yxTomxx"), Some(Literal("xyz")))
744+
)
740745
}
741746

742747
test("precedence of set operations") {

sql/core/src/test/resources/sql-tests/inputs/string-functions.sql

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,8 @@ SELECT substring('Spark SQL' from -3);
4040
SELECT substring('Spark SQL' from 5 for 1);
4141

4242
-- trim/ltrim/rtrim
43-
SELECT trim('yxTomxx', 'xyz'), trim(BOTH 'xyz' FROM 'yxTomxx');
44-
SELECT trim('xxxbarxxx', 'x'), trim(BOTH 'x' FROM 'xxxbarxxx');
43+
SELECT trim('yxTomxx', 'xyz'), trim(BOTH 'xyz' FROM 'yxTomxx'), trim('xyz' FROM 'yxTomxx');
44+
SELECT trim('xxxbarxxx', 'x'), trim(BOTH 'x' FROM 'xxxbarxxx'), trim('x' FROM 'xxxbarxxx');
4545
SELECT ltrim('zzzytest', 'xyz'), trim(LEADING 'xyz' FROM 'zzzytest');
4646
SELECT ltrim('zzzytestxyz', 'xyz'), trim(LEADING 'xyz' FROM 'zzzytestxyz');
4747
SELECT ltrim('xyxXxyLAST WORD', 'xy'), trim(LEADING 'xy' FROM 'xyxXxyLAST WORD');

sql/core/src/test/resources/sql-tests/results/string-functions.sql.out

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -205,19 +205,19 @@ k
205205

206206

207207
-- !query 25
208-
SELECT trim('yxTomxx', 'xyz'), trim(BOTH 'xyz' FROM 'yxTomxx')
208+
SELECT trim('yxTomxx', 'xyz'), trim(BOTH 'xyz' FROM 'yxTomxx'), trim('xyz' FROM 'yxTomxx')
209209
-- !query 25 schema
210-
struct<trim(yxTomxx, xyz):string,trim(yxTomxx, xyz):string>
210+
struct<trim(yxTomxx, xyz):string,trim(yxTomxx, xyz):string,trim(yxTomxx, xyz):string>
211211
-- !query 25 output
212-
Tom Tom
212+
Tom Tom Tom
213213

214214

215215
-- !query 26
216-
SELECT trim('xxxbarxxx', 'x'), trim(BOTH 'x' FROM 'xxxbarxxx')
216+
SELECT trim('xxxbarxxx', 'x'), trim(BOTH 'x' FROM 'xxxbarxxx'), trim('x' FROM 'xxxbarxxx')
217217
-- !query 26 schema
218-
struct<trim(xxxbarxxx, x):string,trim(xxxbarxxx, x):string>
218+
struct<trim(xxxbarxxx, x):string,trim(xxxbarxxx, x):string,trim(xxxbarxxx, x):string>
219219
-- !query 26 output
220-
bar bar
220+
bar bar bar
221221

222222

223223
-- !query 27

0 commit comments

Comments
 (0)