Skip to content

[fix](be) Disable dict encoding for row store columns#63441

Open
eldenmoon wants to merge 1 commit into
apache:masterfrom
eldenmoon:codex/fix-row-store-plain-encoding-20260520
Open

[fix](be) Disable dict encoding for row store columns#63441
eldenmoon wants to merge 1 commit into
apache:masterfrom
eldenmoon:codex/fix-row-store-plain-encoding-20260520

Conversation

@eldenmoon
Copy link
Copy Markdown
Member

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Row store columns contain serialized row data and should stay on plain pages. Dictionary encoding can introduce dictionary pages for the hidden row store column, so force row store column metadata to use the configured plain binary encoding.

Release note

Fix row store columns to use plain encoding instead of dictionary encoding.

Check List (For Author)

  • Test:
    • Unit Test: ./run-be-ut.sh --run --filter='ColumnMetaAccessorTest.RowStoreColumnDoesNotUseDictEncoding'
    • Manual test: build-support/check-format.sh
  • Behavior changed: Yes. Hidden row store columns now use plain encoding instead of dictionary encoding.
  • Does this need documentation: No

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Row store columns contain serialized row data and should stay on plain pages. Dictionary encoding can introduce dictionary pages for the hidden row store column, so force row store column metadata to use the configured plain binary encoding.

### Release note

Fix row store columns to use plain encoding instead of dictionary encoding.

### Check List (For Author)

- Test:
    - Unit Test: ./run-be-ut.sh --run --filter='ColumnMetaAccessorTest.RowStoreColumnDoesNotUseDictEncoding'
    - Manual test: build-support/check-format.sh
- Behavior changed: Yes. Hidden row store columns now use plain encoding instead of dictionary encoding.
- Does this need documentation: No
Copilot AI review requested due to automatic review settings May 20, 2026 08:18
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes BE segment writing for the hidden row-store column (__DORIS_ROW_STORE_COL__) by explicitly forcing plain binary encoding for that column, preventing dictionary encoding from introducing dictionary pages for already-serialized row blobs.

Changes:

  • Force row-store column ColumnMetaPB.encoding to PLAIN_ENCODING / PLAIN_ENCODING_V2 (based on TabletSchema::binary_plain_encoding_default_impl()), in both SegmentWriter and VerticalSegmentWriter.
  • Add a BE unit test that builds a real segment containing a row-store column and asserts its footer metadata encoding is plain (not dict).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
be/src/storage/segment/segment_writer.cpp Forces row-store column metadata encoding to plain when creating the column writer.
be/src/storage/segment/vertical_segment_writer.cpp Mirrors the same row-store encoding override in the vertical writer path.
be/test/storage/segment/column_meta_accessor_test.cpp Adds a segment-build regression test asserting row-store column footer encoding is plain (v2).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31081 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 02d5ada8c34c31d6ed72b6a109ee9c63ed468c5f, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17727	3913	3953	3913
q2	q3	10759	1344	777	777
q4	4682	471	346	346
q5	7637	2253	2113	2113
q6	260	176	139	139
q7	950	768	652	652
q8	9372	1756	1733	1733
q9	6906	4991	4902	4902
q10	6454	2096	1812	1812
q11	448	274	239	239
q12	683	427	299	299
q13	18178	3330	2786	2786
q14	274	256	236	236
q15	q16	817	768	719	719
q17	961	970	934	934
q18	7067	5745	5519	5519
q19	1227	1355	1107	1107
q20	501	400	260	260
q21	5689	2647	2294	2294
q22	430	358	301	301
Total cold run time: 101022 ms
Total hot run time: 31081 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4191	4134	4128	4128
q2	q3	4505	4881	4303	4303
q4	2108	2199	1417	1417
q5	4398	4264	4259	4259
q6	234	177	133	133
q7	2229	1810	1664	1664
q8	2425	2090	2038	2038
q9	7861	7751	7763	7751
q10	4589	4509	4107	4107
q11	560	535	406	406
q12	717	730	516	516
q13	3226	3613	3025	3025
q14	310	317	286	286
q15	q16	706	751	650	650
q17	1380	1319	1312	1312
q18	7800	7271	7084	7084
q19	1138	1115	1103	1103
q20	2208	2231	1934	1934
q21	5310	4590	4466	4466
q22	539	464	416	416
Total cold run time: 56434 ms
Total hot run time: 50998 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169417 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 02d5ada8c34c31d6ed72b6a109ee9c63ed468c5f, data reload: false

query5	4359	663	513	513
query6	350	224	205	205
query7	4264	573	310	310
query8	329	242	219	219
query9	8817	4018	4040	4018
query10	467	330	297	297
query11	5784	2410	2227	2227
query12	183	135	133	133
query13	1275	649	418	418
query14	5924	5386	5041	5041
query14_1	4353	4391	4349	4349
query15	208	204	180	180
query16	1022	463	424	424
query17	1140	713	580	580
query18	2512	488	354	354
query19	222	210	169	169
query20	138	135	129	129
query21	211	136	115	115
query22	13599	13503	13362	13362
query23	17314	16430	16184	16184
query23_1	16119	16270	16215	16215
query24	7397	1791	1332	1332
query24_1	1311	1331	1330	1330
query25	610	514	455	455
query26	1311	331	183	183
query27	2680	528	358	358
query28	4445	1956	1953	1953
query29	1035	657	527	527
query30	309	251	204	204
query31	1116	1064	954	954
query32	97	80	76	76
query33	546	354	307	307
query34	1172	1121	640	640
query35	783	782	690	690
query36	1358	1338	1241	1241
query37	156	110	100	100
query38	3238	3184	3088	3088
query39	940	942	900	900
query39_1	885	886	905	886
query40	247	154	132	132
query41	72	70	68	68
query42	121	116	116	116
query43	330	350	293	293
query44	
query45	214	209	197	197
query46	1109	1225	736	736
query47	2340	2349	2189	2189
query48	405	408	299	299
query49	641	527	406	406
query50	989	350	257	257
query51	4311	4446	4204	4204
query52	109	106	99	99
query53	263	287	211	211
query54	331	300	273	273
query55	97	91	85	85
query56	326	329	311	311
query57	1422	1409	1302	1302
query58	308	287	276	276
query59	1554	1626	1498	1498
query60	345	381	324	324
query61	158	153	159	153
query62	671	632	564	564
query63	239	205	209	205
query64	2457	812	638	638
query65	
query66	1722	486	357	357
query67	30535	30169	29347	29347
query68	
query69	450	347	307	307
query70	1044	961	1007	961
query71	322	276	270	270
query72	2990	2656	2463	2463
query73	842	743	438	438
query74	5090	4894	4760	4760
query75	2688	2615	2297	2297
query76	2285	1145	758	758
query77	386	425	335	335
query78	12176	12070	11593	11593
query79	1472	1059	743	743
query80	1228	560	474	474
query81	510	283	240	240
query82	1370	159	120	120
query83	346	281	251	251
query84	260	140	118	118
query85	944	542	487	487
query86	432	350	320	320
query87	3477	3370	3234	3234
query88	3483	2622	2625	2622
query89	448	383	339	339
query90	1799	183	187	183
query91	184	176	141	141
query92	76	78	75	75
query93	1447	1416	822	822
query94	646	342	318	318
query95	679	400	432	400
query96	987	804	315	315
query97	2693	2691	2594	2594
query98	247	237	236	236
query99	1123	1096	1020	1020
Total cold run time: 254525 ms
Total hot run time: 169417 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 50.00% (4/8) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.53% (20736/38735)
Line Coverage 37.17% (196156/527663)
Region Coverage 33.49% (153665/458872)
Branch Coverage 34.52% (66992/194062)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (8/8) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.85% (27260/37938)
Line Coverage 55.25% (290794/526311)
Region Coverage 52.33% (242436/463294)
Branch Coverage 53.63% (104470/194789)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants