Skip to content

[fix](hive) Fix null timezone dereference in datetime cast with timezone info#60231

Closed
suxiaogang223 wants to merge 2 commits intoapache:masterfrom
suxiaogang223:fix-null-timezone-cast
Closed

[fix](hive) Fix null timezone dereference in datetime cast with timezone info#60231
suxiaogang223 wants to merge 2 commits intoapache:masterfrom
suxiaogang223:fix-null-timezone-cast

Conversation

@suxiaogang223
Copy link
Contributor

@suxiaogang223 suxiaogang223 commented Jan 26, 2026

What problem does this PR solve?

Summary

  • Fix null local_time_zone dereference when parsing timezone-suffixed datetime strings.
  • Keep existing datetimev2 timezone parsing semantics (convert from parsed timezone to session local timezone), instead of rejecting timezone suffixes.
  • Add/keep Hive Text external table regression coverage to verify the engine returns parse failure (no core) when timezone context is unavailable.

Background / Root Cause

  • In timezone parsing branches, some paths dereferenced *local_time_zone without checking null.
  • This can happen in text/serde scan paths where parser is called without a timezone object.
  • The previous fix in this PR avoided the crash by rejecting timezone suffixes for datetimev2, but that changed existing cast behavior.

Changes

  1. datetimev2 parser behavior
  • File: be/src/vec/functions/cast/cast_to_datetimev2_impl.hpp
  • Restore existing behavior for DATE_TIME (datetimev2) with timezone suffix:
    • parse source timezone
    • convert to session local timezone
  • Keep TIMESTAMP_TZ conversion behavior unchanged.
  • Add explicit null check before local timezone conversion.
  1. datetime parser null-guard
  • File: be/src/vec/functions/cast/cast_to_date_or_datetime_impl.hpp
  • Add explicit local_time_zone != nullptr check before timezone conversion in both strict and non-strict parsing paths.
  1. Regression test
  • File: regression-test/suites/external_table_p0/hive/ddl/test_hive_ddl_text_format.groovy
  • Cover timezone-suffixed text data in Hive Text external table and verify parse failure is returned instead of process crash.

Compatibility / Behavior

  • Existing cast('...+08:00' as datetimev2) conversion behavior is preserved.
  • Unsafe null-timezone paths now fail gracefully (parse error) instead of SIGSEGV.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes. Null-timezone crash is changed to parse failure; datetimev2 timezone conversion semantics are preserved.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Jan 26, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

zclllyybb
zclllyybb previously approved these changes Jan 26, 2026
Copy link
Contributor

@zclllyybb zclllyybb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 26, 2026
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@suxiaogang223 suxiaogang223 force-pushed the fix-null-timezone-cast branch from d439a83 to 0bc5924 Compare January 26, 2026 07:57
@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Jan 26, 2026
@suxiaogang223
Copy link
Contributor Author

run external

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 30198 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e558531be45622817f3a26cd5f7fa3d536c2a33d, data reload: false

------ Round 1 ----------------------------------
q1	17591	4460	4303	4303
q2	2040	349	248	248
q3	10175	1315	728	728
q4	10210	805	311	311
q5	7512	2269	1973	1973
q6	206	180	144	144
q7	881	706	612	612
q8	9285	1399	1195	1195
q9	4987	4631	4551	4551
q10	6859	1940	1552	1552
q11	468	267	232	232
q12	402	371	220	220
q13	17822	4052	3222	3222
q14	232	239	215	215
q15	929	800	811	800
q16	673	671	618	618
q17	698	766	547	547
q18	6747	5755	5667	5667
q19	1407	991	632	632
q20	508	496	379	379
q21	2505	1832	1802	1802
q22	327	283	247	247
Total cold run time: 102464 ms
Total hot run time: 30198 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4405	4301	4351	4301
q2	260	343	275	275
q3	2087	2664	2247	2247
q4	1350	1716	1315	1315
q5	4294	4189	4235	4189
q6	222	175	137	137
q7	1893	1830	1649	1649
q8	2447	2720	2535	2535
q9	7619	7387	7613	7387
q10	2875	3024	2604	2604
q11	519	433	431	431
q12	745	790	657	657
q13	3868	4300	3567	3567
q14	302	319	296	296
q15	877	839	810	810
q16	677	711	670	670
q17	1176	1471	1350	1350
q18	8189	8067	7737	7737
q19	874	825	830	825
q20	2039	2158	2038	2038
q21	4834	4289	4160	4160
q22	526	479	433	433
Total cold run time: 52078 ms
Total hot run time: 49613 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.17 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e558531be45622817f3a26cd5f7fa3d536c2a33d, data reload: false

query1	0.05	0.05	0.04
query2	0.09	0.05	0.04
query3	0.25	0.09	0.08
query4	1.61	0.11	0.10
query5	0.27	0.25	0.26
query6	1.16	0.67	0.66
query7	0.02	0.02	0.02
query8	0.05	0.04	0.04
query9	0.56	0.48	0.48
query10	0.57	0.55	0.54
query11	0.14	0.10	0.11
query12	0.14	0.10	0.11
query13	0.63	0.62	0.63
query14	1.07	1.05	1.04
query15	0.88	0.86	0.87
query16	0.41	0.40	0.39
query17	1.17	1.15	1.12
query18	0.25	0.22	0.21
query19	2.06	1.98	1.99
query20	0.02	0.01	0.02
query21	15.39	0.28	0.15
query22	4.88	0.06	0.05
query23	15.81	0.28	0.10
query24	1.28	0.25	0.18
query25	0.07	0.08	0.05
query26	0.14	0.14	0.13
query27	0.09	0.08	0.06
query28	3.32	1.15	0.97
query29	12.54	3.97	3.19
query30	0.27	0.16	0.11
query31	2.82	0.65	0.41
query32	3.23	0.59	0.48
query33	3.17	3.19	3.21
query34	15.82	5.40	4.77
query35	4.79	4.77	4.77
query36	0.66	0.50	0.49
query37	0.11	0.07	0.07
query38	0.07	0.04	0.04
query39	0.05	0.03	0.03
query40	0.18	0.17	0.15
query41	0.09	0.03	0.03
query42	0.05	0.03	0.04
query43	0.05	0.04	0.04
Total cold run time: 96.28 s
Total hot run time: 28.17 s

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 30756 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 058f09f84dd84ba0d2c1cb717e398fbb3de1f966, data reload: false

------ Round 1 ----------------------------------
q1	17649	4542	4293	4293
q2	2028	392	235	235
q3	10114	1293	781	781
q4	10189	788	313	313
q5	7470	2147	1987	1987
q6	195	177	143	143
q7	885	752	612	612
q8	9269	1391	1162	1162
q9	4855	4622	4687	4622
q10	6834	1929	1542	1542
q11	471	256	249	249
q12	343	379	221	221
q13	17787	4093	3225	3225
q14	243	241	227	227
q15	890	802	805	802
q16	686	683	613	613
q17	704	847	524	524
q18	6726	6021	6188	6021
q19	1222	1038	634	634
q20	612	545	402	402
q21	2817	2062	1893	1893
q22	367	293	255	255
Total cold run time: 102356 ms
Total hot run time: 30756 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4606	4556	4739	4556
q2	267	339	258	258
q3	2410	2860	2449	2449
q4	1601	1842	1436	1436
q5	4707	4505	4462	4462
q6	216	178	135	135
q7	1944	1925	1726	1726
q8	2559	2442	2506	2442
q9	7610	7562	7539	7539
q10	2887	3076	2637	2637
q11	513	442	406	406
q12	659	742	671	671
q13	3967	4481	3449	3449
q14	274	287	263	263
q15	834	783	773	773
q16	649	694	631	631
q17	1079	1239	1278	1239
q18	7509	7340	7372	7340
q19	798	812	802	802
q20	1963	2023	1913	1913
q21	4493	4245	4113	4113
q22	497	452	414	414
Total cold run time: 52042 ms
Total hot run time: 49654 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.62 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 058f09f84dd84ba0d2c1cb717e398fbb3de1f966, data reload: false

query1	0.06	0.05	0.04
query2	0.10	0.05	0.04
query3	0.26	0.08	0.08
query4	1.61	0.12	0.11
query5	0.26	0.24	0.25
query6	1.16	0.68	0.68
query7	0.03	0.03	0.02
query8	0.05	0.04	0.04
query9	0.58	0.50	0.49
query10	0.55	0.54	0.54
query11	0.14	0.09	0.11
query12	0.14	0.10	0.11
query13	0.63	0.61	0.63
query14	1.06	1.07	1.05
query15	0.88	0.86	0.88
query16	0.39	0.40	0.40
query17	1.20	1.12	1.14
query18	0.23	0.21	0.22
query19	1.98	1.98	2.02
query20	0.02	0.01	0.02
query21	15.40	0.28	0.14
query22	4.86	0.08	0.05
query23	15.73	0.29	0.11
query24	0.93	0.64	0.98
query25	0.16	0.06	0.16
query26	0.14	0.13	0.13
query27	0.07	0.06	0.08
query28	5.10	1.13	0.97
query29	12.54	4.02	3.25
query30	0.28	0.13	0.11
query31	2.82	0.66	0.40
query32	3.24	0.59	0.50
query33	3.26	3.20	3.27
query34	16.04	5.38	4.72
query35	4.84	4.71	4.80
query36	0.65	0.50	0.49
query37	0.11	0.07	0.08
query38	0.08	0.04	0.03
query39	0.05	0.03	0.03
query40	0.19	0.17	0.14
query41	0.09	0.04	0.03
query42	0.05	0.04	0.03
query43	0.05	0.04	0.04
Total cold run time: 98.01 s
Total hot run time: 28.62 s

zclllyybb
zclllyybb previously approved these changes Feb 12, 2026
Copy link
Contributor

@zclllyybb zclllyybb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 12, 2026
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/6) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.74% (19475/36923)
Line Coverage 36.25% (181478/500661)
Region Coverage 32.62% (140728/431371)
Branch Coverage 33.66% (61041/181319)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 0.00% (0/6) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.74% (25953/36178)
Line Coverage 54.41% (271744/499422)
Region Coverage 51.91% (226215/435750)
Branch Coverage 53.35% (97101/182023)

@suxiaogang223
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Feb 12, 2026
@doris-robot
Copy link

TPC-H: Total hot run time: 30635 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f6015a14ac9c8da2740a3f3c9309591a240a0377, data reload: false

------ Round 1 ----------------------------------
q1	17580	4600	4308	4308
q2	2105	358	233	233
q3	10113	1311	744	744
q4	10197	771	305	305
q5	7516	2203	1936	1936
q6	200	175	141	141
q7	884	738	600	600
q8	9284	1412	1133	1133
q9	5028	4700	4664	4664
q10	6848	1935	1535	1535
q11	447	273	228	228
q12	404	379	220	220
q13	17786	4126	3188	3188
q14	238	243	226	226
q15	907	802	809	802
q16	701	695	616	616
q17	719	856	517	517
q18	6740	5900	6209	5900
q19	2341	1104	746	746
q20	530	518	423	423
q21	2757	1967	1917	1917
q22	340	292	253	253
Total cold run time: 103665 ms
Total hot run time: 30635 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4665	4570	4750	4570
q2	273	364	271	271
q3	2295	2799	2451	2451
q4	1497	1971	1518	1518
q5	4815	4576	4609	4576
q6	229	180	135	135
q7	1964	1901	1733	1733
q8	2552	2446	2412	2412
q9	7471	7724	7518	7518
q10	2780	3125	2675	2675
q11	515	452	426	426
q12	670	749	627	627
q13	3801	4052	3196	3196
q14	268	284	255	255
q15	816	787	777	777
q16	644	686	635	635
q17	1091	1259	1327	1259
q18	7349	7522	7418	7418
q19	842	794	816	794
q20	1982	2026	1879	1879
q21	4535	4339	4073	4073
q22	510	456	419	419
Total cold run time: 51564 ms
Total hot run time: 49617 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 189119 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f6015a14ac9c8da2740a3f3c9309591a240a0377, data reload: false

query5	4328	625	494	494
query6	339	215	195	195
query7	4227	464	265	265
query8	340	239	237	237
query9	8740	2726	2759	2726
query10	513	396	335	335
query11	17201	16951	16825	16825
query12	184	118	126	118
query13	1253	469	347	347
query14	5822	3170	2987	2987
query14_1	2778	2759	2796	2759
query15	221	193	172	172
query16	1003	476	473	473
query17	1090	700	603	603
query18	2451	454	343	343
query19	210	203	182	182
query20	142	130	133	130
query21	228	144	126	126
query22	4954	5123	4680	4680
query23	17289	16912	16739	16739
query23_1	16881	16793	16735	16735
query24	7197	1612	1253	1253
query24_1	1216	1222	1250	1222
query25	545	457	410	410
query26	1247	270	159	159
query27	2750	461	299	299
query28	4526	1895	1885	1885
query29	818	569	475	475
query30	324	257	213	213
query31	875	750	638	638
query32	87	85	85	85
query33	536	376	297	297
query34	914	913	575	575
query35	648	693	624	624
query36	1119	1128	1035	1035
query37	140	103	89	89
query38	2964	2925	2946	2925
query39	853	859	855	855
query39_1	802	826	832	826
query40	224	141	125	125
query41	72	72	67	67
query42	112	108	106	106
query43	392	380	355	355
query44	1368	744	725	725
query45	202	196	183	183
query46	895	982	602	602
query47	2140	2118	2068	2068
query48	320	316	234	234
query49	621	448	365	365
query50	678	272	212	212
query51	4158	4129	4126	4126
query52	108	114	99	99
query53	306	330	277	277
query54	314	285	275	275
query55	92	90	83	83
query56	317	329	330	329
query57	1387	1333	1261	1261
query58	301	290	284	284
query59	2542	2676	2546	2546
query60	377	352	335	335
query61	149	147	168	147
query62	624	597	546	546
query63	309	277	272	272
query64	4804	1227	948	948
query65	4601	4512	4504	4504
query66	1437	462	350	350
query67	16482	16528	16272	16272
query68	2437	1078	735	735
query69	408	320	289	289
query70	1035	976	957	957
query71	346	318	307	307
query72	2874	2834	2485	2485
query73	528	560	320	320
query74	9592	9577	9474	9474
query75	2832	2752	2462	2462
query76	2298	1078	661	661
query77	377	377	321	321
query78	10899	11072	10541	10541
query79	2838	912	599	599
query80	1696	570	486	486
query81	555	279	253	253
query82	983	146	114	114
query83	321	301	246	246
query84	253	122	99	99
query85	881	495	413	413
query86	409	315	296	296
query87	3120	3129	2983	2983
query88	3611	2670	2659	2659
query89	434	387	359	359
query90	1869	187	181	181
query91	162	158	130	130
query92	80	76	69	69
query93	1245	888	488	488
query94	649	316	270	270
query95	589	347	386	347
query96	654	533	227	227
query97	2487	2502	2412	2412
query98	234	221	214	214
query99	1015	984	932	932
Total cold run time: 262853 ms
Total hot run time: 189119 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.44 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f6015a14ac9c8da2740a3f3c9309591a240a0377, data reload: false

query1	0.06	0.05	0.05
query2	0.10	0.05	0.04
query3	0.26	0.09	0.08
query4	1.60	0.11	0.12
query5	0.27	0.26	0.25
query6	1.17	0.67	0.68
query7	0.03	0.03	0.03
query8	0.05	0.03	0.04
query9	0.57	0.49	0.51
query10	0.55	0.55	0.55
query11	0.15	0.10	0.10
query12	0.15	0.11	0.10
query13	0.63	0.62	0.60
query14	1.08	1.06	1.05
query15	0.89	0.87	0.88
query16	0.40	0.39	0.40
query17	1.14	1.15	1.15
query18	0.23	0.21	0.21
query19	2.10	2.02	2.07
query20	0.02	0.01	0.02
query21	15.39	0.25	0.14
query22	5.32	0.06	0.06
query23	15.95	0.29	0.11
query24	1.36	0.70	0.36
query25	0.12	0.06	0.06
query26	0.14	0.14	0.14
query27	0.09	0.06	0.06
query28	4.82	1.13	0.96
query29	12.56	3.92	3.19
query30	0.29	0.13	0.12
query31	2.82	0.64	0.41
query32	3.23	0.60	0.51
query33	3.32	3.26	3.21
query34	15.98	5.40	4.65
query35	4.83	4.82	4.79
query36	0.66	0.49	0.49
query37	0.11	0.07	0.07
query38	0.08	0.04	0.04
query39	0.05	0.03	0.03
query40	0.20	0.15	0.15
query41	0.09	0.03	0.03
query42	0.05	0.03	0.03
query43	0.04	0.04	0.03
Total cold run time: 98.95 s
Total hot run time: 28.44 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/6) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.78% (19489/36928)
Line Coverage 36.27% (181598/500690)
Region Coverage 32.65% (140834/431378)
Branch Coverage 33.69% (61086/181339)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 0.00% (0/6) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.55% (26611/36183)
Line Coverage 56.67% (283063/499451)
Region Coverage 54.18% (236076/435757)
Branch Coverage 56.00% (101944/182043)

@suxiaogang223
Copy link
Contributor Author

suxiaogang223 commented Feb 13, 2026

Verification update:

I re-validated this with the dirty Hive text data. On current master, the behavior is:

  • count(*) = 4
  • count(tbrq) = 0
  • sum(tbrq is null) = 4

So the rows are preserved and invalid datetime values are converted to NULL; no core dump is reproduced in this path.

The null-timezone dereference risk is already guarded in PR #56646 (feature support timestamptz type), which added local_time_zone != nullptr checks in cast_to_datetimev2_impl.hpp.

Given this validation, this PR is being closed as already fixed by #56646.

@suxiaogang223
Copy link
Contributor Author

Closing as already fixed by #56646 after re-validation on current master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants