Skip to content

Conversation

@sollhui
Copy link
Contributor

@sollhui sollhui commented Dec 27, 2025

What problem does this PR solve?

In #52515 introduces VCG(Virtual Compute Group) to be used for multi availability zone disaster recovery.

But routine load job do not adapt it perfectly: If a cluster in an availability zone crashes, VCG provides disaster recovery capabilities, but the job will not be automatically resume. So this PR removed the dead BE count calculation when judge isNeedAutoSchedule.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@sollhui
Copy link
Contributor Author

sollhui commented Dec 27, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 35319 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6ed701db285b6757e856c75f032db781c5102475, data reload: false

------ Round 1 ----------------------------------
q1	17629	4322	4125	4125
q2	2035	344	256	256
q3	10155	1311	736	736
q4	10236	911	320	320
q5	7537	2195	1907	1907
q6	183	163	136	136
q7	1041	846	712	712
q8	9341	1461	1214	1214
q9	6918	5325	5441	5325
q10	6844	2382	1995	1995
q11	547	334	305	305
q12	663	730	574	574
q13	17788	3658	3016	3016
q14	290	300	287	287
q15	608	509	517	509
q16	698	687	627	627
q17	685	800	593	593
q18	7390	7096	6993	6993
q19	1163	962	627	627
q20	403	377	249	249
q21	4222	3956	3845	3845
q22	1033	980	968	968
Total cold run time: 107409 ms
Total hot run time: 35319 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4190	4114	4111	4111
q2	339	399	318	318
q3	2163	2641	2340	2340
q4	1310	1759	1296	1296
q5	4222	4609	4759	4609
q6	240	174	129	129
q7	2065	1987	1814	1814
q8	2687	2600	2515	2515
q9	7567	7544	7681	7544
q10	3095	3254	2856	2856
q11	626	542	533	533
q12	689	917	636	636
q13	3521	4052	3476	3476
q14	313	326	279	279
q15	573	508	521	508
q16	676	681	653	653
q17	1203	1379	1440	1379
q18	7835	7961	8662	7961
q19	943	873	916	873
q20	2019	1974	1975	1974
q21	4844	4400	4155	4155
q22	1110	1046	1070	1046
Total cold run time: 52230 ms
Total hot run time: 51005 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 180070 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6ed701db285b6757e856c75f032db781c5102475, data reload: false

query5	4400	596	449	449
query6	327	251	237	237
query7	4234	499	292	292
query8	320	261	248	248
query9	8762	2559	2560	2559
query10	499	391	341	341
query11	15709	15087	15091	15087
query12	186	119	119	119
query13	1270	535	393	393
query14	5533	3077	2802	2802
query14_1	2706	2695	2691	2691
query15	207	200	178	178
query16	833	458	444	444
query17	1128	728	624	624
query18	2436	454	358	358
query19	237	247	215	215
query20	122	120	117	117
query21	226	143	119	119
query22	3939	4044	3917	3917
query23	16722	16217	15944	15944
query23_1	16110	16090	16134	16090
query24	7389	1680	1264	1264
query24_1	1252	1260	1252	1252
query25	602	514	464	464
query26	1257	279	169	169
query27	2734	471	309	309
query28	4487	2139	2123	2123
query29	846	591	486	486
query30	322	244	220	220
query31	817	693	620	620
query32	82	70	71	70
query33	553	360	304	304
query34	917	921	556	556
query35	774	856	726	726
query36	879	909	840	840
query37	136	96	80	80
query38	3033	3135	2967	2967
query39	751	730	732	730
query39_1	702	724	726	724
query40	240	144	124	124
query41	69	63	61	61
query42	109	116	112	112
query43	437	429	411	411
query44	1367	753	749	749
query45	195	188	181	181
query46	904	1000	633	633
query47	1667	1727	1634	1634
query48	316	328	258	258
query49	657	431	357	357
query50	685	308	224	224
query51	3876	3809	3819	3809
query52	109	113	105	105
query53	322	350	297	297
query54	290	256	241	241
query55	83	76	73	73
query56	290	316	312	312
query57	1205	1183	1116	1116
query58	275	257	255	255
query59	2372	2464	2371	2371
query60	325	316	306	306
query61	177	161	161	161
query62	776	697	658	658
query63	339	302	307	302
query64	5000	1305	994	994
query65	4028	3974	4015	3974
query66	1492	439	313	313
query67	15258	14926	14865	14865
query68	8226	1006	729	729
query69	510	343	310	310
query70	1089	1018	954	954
query71	360	309	287	287
query72	5762	4776	4811	4776
query73	664	561	306	306
query74	9015	9023	8713	8713
query75	3190	3192	2822	2822
query76	3848	1159	748	748
query77	542	413	296	296
query78	9397	9631	8868	8868
query79	1293	885	629	629
query80	712	641	552	552
query81	503	273	237	237
query82	205	139	107	107
query83	264	265	244	244
query84	264	130	112	112
query85	897	513	470	470
query86	355	297	303	297
query87	3108	3228	3090	3090
query88	3365	2315	2296	2296
query89	490	433	408	408
query90	2280	175	165	165
query91	172	163	153	153
query92	83	66	66	66
query93	1548	934	561	561
query94	468	305	274	274
query95	580	375	312	312
query96	602	491	212	212
query97	2232	2292	2232	2232
query98	230	199	193	193
query99	1325	1345	1353	1345
Total cold run time: 260104 ms
Total hot run time: 180070 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 27.26 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 6ed701db285b6757e856c75f032db781c5102475, data reload: false

query1	0.06	0.04	0.04
query2	0.10	0.05	0.05
query3	0.26	0.09	0.08
query4	1.61	0.12	0.10
query5	0.26	0.25	0.26
query6	1.16	0.65	0.63
query7	0.03	0.03	0.03
query8	0.05	0.04	0.04
query9	0.57	0.51	0.50
query10	0.53	0.55	0.56
query11	0.16	0.11	0.11
query12	0.14	0.11	0.11
query13	0.62	0.60	0.61
query14	1.00	0.99	0.99
query15	0.81	0.79	0.81
query16	0.37	0.40	0.41
query17	1.00	1.01	1.12
query18	0.24	0.21	0.22
query19	1.83	1.94	1.86
query20	0.02	0.02	0.01
query21	15.43	0.30	0.14
query22	4.98	0.05	0.04
query23	16.06	0.27	0.10
query24	1.25	0.24	0.72
query25	0.09	0.06	0.06
query26	0.15	0.14	0.15
query27	0.05	0.06	0.06
query28	4.29	1.23	1.02
query29	12.57	3.93	3.16
query30	0.28	0.14	0.11
query31	2.80	0.62	0.39
query32	3.23	0.53	0.44
query33	2.97	3.01	3.02
query34	16.77	5.15	4.59
query35	4.56	4.61	4.55
query36	0.65	0.50	0.48
query37	0.12	0.07	0.06
query38	0.07	0.05	0.04
query39	0.05	0.03	0.03
query40	0.17	0.15	0.14
query41	0.09	0.03	0.03
query42	0.05	0.03	0.03
query43	0.04	0.04	0.04
Total cold run time: 97.54 s
Total hot run time: 27.26 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage `` 🎉
Increment coverage report
Complete coverage report

Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 27, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 2458657 into apache:master Dec 27, 2025
28 of 30 checks passed
github-actions bot pushed a commit that referenced this pull request Dec 27, 2025
### What problem does this PR solve?

In #52515 introduces VCG(Virtual
Compute Group) to be used for multi availability zone disaster recovery.

But routine load job do not adapt it perfectly: If a cluster in an
availability zone crashes, VCG provides disaster recovery capabilities,
but the job will not be automatically resume. So this PR removed the
`dead BE count` calculation when judge `isNeedAutoSchedule`.

### Release note

None
github-actions bot pushed a commit that referenced this pull request Dec 27, 2025
### What problem does this PR solve?

In #52515 introduces VCG(Virtual
Compute Group) to be used for multi availability zone disaster recovery.

But routine load job do not adapt it perfectly: If a cluster in an
availability zone crashes, VCG provides disaster recovery capabilities,
but the job will not be automatically resume. So this PR removed the
`dead BE count` calculation when judge `isNeedAutoSchedule`.

### Release note

None
yiguolei pushed a commit that referenced this pull request Dec 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.1.x dev/4.0.3-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants