Skip to content

[feat](paimon) integrate paimon-cpp reader#60676

Merged
BiteTheDDDDt merged 2 commits intoapache:masterfrom
xylaaaaa:feature/paimoncpp-pr-clean
Feb 21, 2026
Merged

[feat](paimon) integrate paimon-cpp reader#60676
BiteTheDDDDt merged 2 commits intoapache:masterfrom
xylaaaaa:feature/paimoncpp-pr-clean

Conversation

@xylaaaaa
Copy link
Contributor

@xylaaaaa xylaaaaa commented Feb 11, 2026

What problem does this PR solve?

Issue Number: #56005

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Copilot AI review requested due to automatic review settings February 11, 2026 08:16
@Thearas
Copy link
Contributor

Thearas commented Feb 11, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an optional “paimon-cpp” execution path so Doris BE can read Paimon splits using a native C++ reader instead of the existing JNI-based path.

Changes:

  • Adds a new session/query option (enable_paimon_cpp_reader) and wires it through FE → Thrift → BE.
  • Introduces BE-side PaimonCppReader, predicate conversion, and a Doris-backed paimon-cpp filesystem implementation.
  • Extends FE Paimon split encoding to support paimon-cpp compatible split serialization and passes table location metadata to BE.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
gensrc/thrift/PaloInternalService.thrift Adds enable_paimon_cpp_reader to TQueryOptions for FE→BE transport.
fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java Adds session variable + Thrift wiring; extends IgnoreSplitType.
fe/fe-core/src/main/java/org/apache/doris/datasource/paimon/source/PaimonSource.java Exposes table location to support paimon-cpp reader initialization.
fe/fe-core/src/main/java/org/apache/doris/datasource/paimon/source/PaimonScanNode.java Encodes splits differently for paimon-cpp and sets table location into scan ranges.
fe/fe-core/src/main/java/org/apache/doris/datasource/paimon/PaimonUtil.java Adds native DataSplit serialization (Base64) for paimon-cpp compatibility.
fe/be-java-extensions/paimon-scanner/.../PaimonUtils.java Adjusts Base64 decoding behavior in the JNI scanner extension.
be/src/vec/exec/scan/file_scanner.cpp Chooses between PaimonJniReader and new PaimonCppReader; adds predicate conversion hook.
be/src/vec/exec/format/table/paimon_predicate_converter.{h,cpp} New converter from Doris conjuncts to paimon-cpp predicates.
be/src/vec/exec/format/table/paimon_doris_file_system.{h,cpp} New paimon-cpp filesystem implementation backed by Doris FileSystem.
be/src/vec/exec/format/table/paimon_cpp_reader.{h,cpp} New paimon-cpp based reader implementation.
be/CMakeLists.txt Adds paimon-cpp build option and linkage setup.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +576 to +609
# Paimon static library dependencies
set(paimon_fmt_lib "${PAIMON_HOME}/lib64/paimon_deps/libfmtd.a")
if (NOT EXISTS "${paimon_fmt_lib}")
set(paimon_fmt_lib "${PAIMON_HOME}/lib64/paimon_deps/libfmt.a")
endif()
add_library(paimon_fmt STATIC IMPORTED)
set_target_properties(paimon_fmt PROPERTIES IMPORTED_LOCATION "${paimon_fmt_lib}")
set(paimon_tbb_lib "${PAIMON_HOME}/lib64/paimon_deps/libtbb_debug.a")
if (NOT EXISTS "${paimon_tbb_lib}")
set(paimon_tbb_lib "${PAIMON_HOME}/lib64/paimon_deps/libtbb.a")
endif()
add_library(paimon_tbb STATIC IMPORTED)
set_target_properties(paimon_tbb PROPERTIES IMPORTED_LOCATION "${paimon_tbb_lib}")
add_library(paimon_roaring STATIC IMPORTED)
set_target_properties(paimon_roaring PROPERTIES IMPORTED_LOCATION "${PAIMON_HOME}/lib64/paimon_deps/libroaring_bitmap.a")
add_library(paimon_xxhash STATIC IMPORTED)
set_target_properties(paimon_xxhash PROPERTIES IMPORTED_LOCATION "${PAIMON_HOME}/lib64/paimon_deps/libxxhash.a")
add_library(paimon_arrow_dataset STATIC IMPORTED)
set_target_properties(paimon_arrow_dataset PROPERTIES IMPORTED_LOCATION "${PAIMON_HOME}/lib64/paimon_deps/libarrow_dataset.a")
add_library(paimon_arrow_acero STATIC IMPORTED)
set_target_properties(paimon_arrow_acero PROPERTIES IMPORTED_LOCATION "${PAIMON_HOME}/lib64/paimon_deps/libarrow_acero.a")
add_library(paimon_arrow STATIC IMPORTED)
set_target_properties(paimon_arrow PROPERTIES IMPORTED_LOCATION "${PAIMON_HOME}/lib64/paimon_deps/libarrow.a")
set(paimon_arrow_bundled_deps_lib "${PAIMON_HOME}/lib64/paimon_deps/libarrow_bundled_dependencies.a")
if (EXISTS "${paimon_arrow_bundled_deps_lib}")
add_library(paimon_arrow_bundled_dependencies STATIC IMPORTED)
set_target_properties(paimon_arrow_bundled_dependencies PROPERTIES IMPORTED_LOCATION "${paimon_arrow_bundled_deps_lib}")
endif()
add_library(paimon_parquet STATIC IMPORTED)
set_target_properties(paimon_parquet PROPERTIES IMPORTED_LOCATION "${PAIMON_HOME}/lib64/paimon_deps/libparquet.a")
# Always use paimon-cpp bundled ORC to avoid conflicts with Doris ORC.
add_library(paimon_orc STATIC IMPORTED)
set_target_properties(paimon_orc PROPERTIES IMPORTED_LOCATION "${PAIMON_HOME}/lib64/paimon_deps/liborc.a")

Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The paimon-cpp linkage setup hardcodes a ${PAIMON_HOME}/lib64/paimon_deps/... layout and creates a number of imported static libs manually. This is brittle across platforms/install layouts (lib vs lib64, debug vs release naming) and can easily break packaging. Prefer consuming targets exported by PaimonConfig.cmake (and its transitive deps), or at least probe both lib and lib64 and fail with a clearer message showing which paths were tried.

Copilot uses AI. Check for mistakes.
Comment on lines +559 to +569
if (ENABLE_PAIMON_CPP)
if (NOT PAIMON_HOME)
set(_paimon_default_home "${THIRDPARTY_DIR}/paimon-cpp")
if (EXISTS "${_paimon_default_home}/lib/cmake/Paimon/PaimonConfig.cmake")
set(PAIMON_HOME "${_paimon_default_home}" CACHE PATH "" FORCE)
endif()
unset(_paimon_default_home)
endif()
if (NOT PAIMON_HOME)
message(FATAL_ERROR "ENABLE_PAIMON_CPP=ON but PAIMON_HOME is not set")
endif()
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ENABLE_PAIMON_CPP is defaulted to ON, but if ${THIRDPARTY_DIR}/paimon-cpp is not present and PAIMON_HOME isn’t set, the build hard-fails (FATAL_ERROR). In this repo thirdparty/paimon-cpp does not exist, so a default build will break. Consider defaulting this option to OFF (or auto-disabling when the config isn’t found) and only failing when the user explicitly enables it.

Copilot uses AI. Check for mistakes.
Comment on lines +70 to +72
#include "vec/exec/format/table/paimon_cpp_reader.h"
#include "vec/exec/format/table/paimon_jni_reader.h"
#include "vec/exec/format/table/paimon_predicate_converter.h"
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with ENABLE_PAIMON_CPP=OFF, the current build still compiles all be/src/vec/**/*.cpp via file(GLOB_RECURSE VEC_FILES *.cpp), and file_scanner.cpp now unconditionally includes paimon-cpp headers. This means disabling the option won’t actually prevent compilation failures when paimon-cpp headers/libs aren’t available. To make the option effective, gate the includes/implementation with a compile definition (e.g., #ifdef ENABLE_PAIMON_CPP) and/or exclude the paimon-cpp source files from the Vec target when the option is OFF.

Suggested change
#include "vec/exec/format/table/paimon_cpp_reader.h"
#include "vec/exec/format/table/paimon_jni_reader.h"
#include "vec/exec/format/table/paimon_predicate_converter.h"
#ifdef ENABLE_PAIMON_CPP
#include "vec/exec/format/table/paimon_cpp_reader.h"
#include "vec/exec/format/table/paimon_predicate_converter.h"
#endif
#include "vec/exec/format/table/paimon_jni_reader.h"

Copilot uses AI. Check for mistakes.
Comment on lines 1297 to 1308
public enum IgnoreSplitType {
NONE,
IGNORE_JNI,
IGNORE_NATIVE
IGNORE_NATIVE,
IGNORE_PAIMON_CPP
}

public static final String IGNORE_SPLIT_TYPE = "ignore_split_type";
@VariableMgr.VarAttr(name = IGNORE_SPLIT_TYPE,
checker = "checkIgnoreSplitType",
options = {"NONE", "IGNORE_JNI", "IGNORE_NATIVE"},
options = {"NONE", "IGNORE_JNI", "IGNORE_NATIVE", "IGNORE_PAIMON_CPP"},
description = {"忽略指定类型的 split", "Ignore splits of the specified type"})
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IGNORE_PAIMON_CPP was added to IgnoreSplitType and exposed as an option, but there’s no corresponding handling in the split generation logic (e.g., PaimonScanNode.getSplits only checks IGNORE_NATIVE and IGNORE_JNI). As-is, setting ignore_split_type=IGNORE_PAIMON_CPP will have no effect. Either implement the behavior (skip paimon-cpp/JNI-style splits when paimon-cpp is enabled) or remove the option to avoid misleading users.

Copilot uses AI. Check for mistakes.
try {
decoded = URL_DECODER.decode(encodedStr.getBytes(java.nio.charset.StandardCharsets.UTF_8));
} catch (IllegalArgumentException e) {
// Fallback to standard Base64 for splits encoded by native Paimon serialization.
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback comment is inaccurate: decoding with standard Base64 doesn’t make the payload compatible with InstantiationUtil.deserializeObject. That method still requires Java-serialized bytes, not “native Paimon serialization”. Please reword the comment to reflect that this fallback only supports alternative Base64 alphabets/encodings for the same Java-serialized payload.

Suggested change
// Fallback to standard Base64 for splits encoded by native Paimon serialization.
// Fallback to standard Base64 for splits encoded with a non-URL-safe Base64 alphabet.

Copilot uses AI. Check for mistakes.
}
// Set table location for paimon-cpp reader
String tableLocation = source.getTableLocation();
if (tableLocation != null) {
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When enable_paimon_cpp_reader is on, BE will error out if paimon_table isn’t set. Here tableLocation can still be null/empty (e.g., non-FileStoreTable without a path option), in which case the query will fail later with a generic BE internal error. Consider validating tableLocation when paimon-cpp is enabled and throwing a clear FE-side exception (or ensuring a non-empty location is always populated).

Suggested change
if (tableLocation != null) {
if (sessionVariable.isEnablePaimonCppReader() && split instanceof DataSplit
&& (tableLocation == null || tableLocation.isEmpty())) {
throw new RuntimeException(
"Paimon table location is empty while paimon-cpp reader is enabled. "
+ "Please ensure the Paimon table path is configured or disable paimon-cpp reader.");
}
if (tableLocation != null && !tableLocation.isEmpty()) {

Copilot uses AI. Check for mistakes.
@xylaaaaa
Copy link
Contributor Author

run buildall

1 similar comment
@xylaaaaa
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.32% (1795/2263)
Line Coverage 64.81% (31951/49300)
Region Coverage 65.51% (15938/24330)
Branch Coverage 56.00% (8470/15124)

@xylaaaaa
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.32% (1795/2263)
Line Coverage 64.81% (31949/49300)
Region Coverage 65.53% (15944/24330)
Branch Coverage 56.01% (8471/15124)

@xylaaaaa
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.32% (1795/2263)
Line Coverage 64.81% (31951/49300)
Region Coverage 65.50% (15937/24330)
Branch Coverage 55.99% (8468/15124)

@morningman morningman force-pushed the feature/paimoncpp-pr-clean branch from 208a1b7 to 5eb2ab9 Compare February 20, 2026 01:49
xylaaaaa and others added 2 commits February 20, 2026 13:11
fix(be): auto-detect paimon home in CI layouts

fix(ci): support paimon lib64 config path

fix(ci): default PAIMON_HOME to THIRDPARTY_DIR

fix(build): switch paimoncpp BE linkage to thirdparty static libs

be: force-link paimon factory registration libs

be: narrow paimon whole-archive and link arrow dataset deps

be: make paimon arrow_dataset linkage optional

[fix](paimon) resolve parquet static link deps in be

[fix](paimon) drop redundant explicit arrow feature flags

2
@morningman morningman force-pushed the feature/paimoncpp-pr-clean branch from 71f67aa to 1038ee8 Compare February 20, 2026 05:12
@morningman
Copy link
Contributor

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.33% (1796/2264)
Line Coverage 64.91% (32062/49393)
Region Coverage 65.59% (15992/24383)
Branch Coverage 56.08% (8506/15168)

@doris-robot
Copy link

TPC-H: Total hot run time: 29419 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1038ee88d16a41125000d70c0bfa2a7a96219c4c, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17631	4497	4347	4347
q2	q3	10730	823	521	521
q4	4719	375	265	265
q5	8084	1215	1011	1011
q6	230	173	146	146
q7	804	849	689	689
q8	10915	1503	1325	1325
q9	6771	4814	4770	4770
q10	6855	1878	1639	1639
q11	472	261	237	237
q12	750	575	481	481
q13	17771	4210	3474	3474
q14	244	234	220	220
q15	939	804	804	804
q16	767	723	691	691
q17	943	862	439	439
q18	5987	5501	5258	5258
q19	1244	1048	868	868
q20	619	615	471	471
q21	4577	1919	1504	1504
q22	359	321	259	259
Total cold run time: 101411 ms
Total hot run time: 29419 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4682	4579	4666	4579
q2	q3	1808	2268	1795	1795
q4	854	1205	769	769
q5	4045	4587	4386	4386
q6	192	176	144	144
q7	1748	1641	1503	1503
q8	2474	2724	2595	2595
q9	8046	7460	7387	7387
q10	2787	2854	2413	2413
q11	511	435	441	435
q12	508	582	447	447
q13	3890	4398	3681	3681
q14	309	317	291	291
q15	843	791	816	791
q16	711	757	710	710
q17	1194	1514	1285	1285
q18	7020	6939	6610	6610
q19	887	925	933	925
q20	2060	2182	2067	2067
q21	3939	3516	3389	3389
q22	506	469	424	424
Total cold run time: 49014 ms
Total hot run time: 46626 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184205 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1038ee88d16a41125000d70c0bfa2a7a96219c4c, data reload: false

query5	4502	640	512	512
query6	336	216	212	212
query7	4235	497	265	265
query8	346	252	233	233
query9	8803	2777	2804	2777
query10	531	425	342	342
query11	17013	17563	17233	17233
query12	223	132	132	132
query13	1543	477	335	335
query14	6585	3315	3166	3166
query14_1	3008	2887	2925	2887
query15	215	207	177	177
query16	1075	478	468	468
query17	1434	760	689	689
query18	2756	432	338	338
query19	213	212	189	189
query20	138	132	127	127
query21	216	143	117	117
query22	4814	4895	4869	4869
query23	17203	16807	16581	16581
query23_1	16671	16727	16721	16721
query24	7172	1625	1213	1213
query24_1	1217	1241	1234	1234
query25	556	451	404	404
query26	1236	258	148	148
query27	2799	468	304	304
query28	4501	1873	1865	1865
query29	794	575	468	468
query30	329	241	210	210
query31	856	736	656	656
query32	78	69	76	69
query33	521	339	289	289
query34	932	897	574	574
query35	618	697	598	598
query36	1062	1146	973	973
query37	139	88	82	82
query38	2996	2951	2810	2810
query39	898	868	842	842
query39_1	874	828	812	812
query40	229	147	136	136
query41	61	59	59	59
query42	104	105	104	104
query43	387	393	354	354
query44	
query45	198	190	182	182
query46	883	990	596	596
query47	2107	2148	2089	2089
query48	308	322	233	233
query49	628	457	378	378
query50	675	279	210	210
query51	4120	4132	4074	4074
query52	107	106	96	96
query53	293	337	284	284
query54	306	268	267	267
query55	88	82	85	82
query56	304	303	314	303
query57	1366	1332	1250	1250
query58	290	281	266	266
query59	2601	2743	2564	2564
query60	342	337	323	323
query61	148	145	142	142
query62	608	583	527	527
query63	305	270	282	270
query64	4857	1259	989	989
query65	
query66	1386	457	361	361
query67	16478	16395	16287	16287
query68	
query69	419	323	300	300
query70	961	993	986	986
query71	349	308	301	301
query72	2937	2862	2616	2616
query73	551	564	333	333
query74	9990	9935	9806	9806
query75	2869	2769	2494	2494
query76	2299	1038	691	691
query77	388	439	322	322
query78	11187	11350	10732	10732
query79	1179	800	587	587
query80	1393	628	543	543
query81	571	274	246	246
query82	965	150	113	113
query83	327	264	240	240
query84	251	118	105	105
query85	899	489	421	421
query86	421	301	289	289
query87	3131	3095	2975	2975
query88	3558	2695	2639	2639
query89	428	370	365	365
query90	1990	185	177	177
query91	166	159	133	133
query92	78	74	72	72
query93	993	840	498	498
query94	652	314	288	288
query95	586	398	320	320
query96	634	510	229	229
query97	2467	2488	2383	2383
query98	229	226	213	213
query99	1020	1014	911	911
Total cold run time: 253860 ms
Total hot run time: 184205 ms

@morningman morningman changed the title feat(paimon): integrate paimon-cpp reader [feat](paimon): integrate paimon-cpp reader Feb 20, 2026
@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 3.27% (43/1315) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.56% (19544/37181)
Line Coverage 36.14% (182182/504093)
Region Coverage 32.48% (141312/435029)
Branch Coverage 33.46% (61268/183102)

@morningman morningman changed the title [feat](paimon): integrate paimon-cpp reader [feat](paimon) integrate paimon-cpp reader Feb 20, 2026
@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 4.03% (53/1315) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.29% (26707/36440)
Line Coverage 56.48% (283994/502854)
Region Coverage 53.99% (237234/439414)
Branch Coverage 55.69% (102370/183806)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 16.00% (4/25) 🎉
Increment coverage report
Complete coverage report

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Feb 21, 2026
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@BiteTheDDDDt BiteTheDDDDt merged commit 7dcca9d into apache:master Feb 21, 2026
31 of 33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants