[spi] Add Protobuf entry in RecordReaderFactory #17601
Open
+10
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue(s)
Inconsistency between
FileFormatenum andPluginManagerinput format mappings causing protobuf record reader lookup failuresDescription
This pull request fixes a discrepancy between two parallel record reader registration mechanisms in Pinot.
Background - Two Registration Systems:
Pinot has two places that maintain record reader class name mappings:
RecordReaderFactory(usesFileFormatenum with UPPERCASE keys)SegmentIndexCreationDriverImpl,RecordReaderFileConfig,IngestionUtils, CLI tools (CreateSegmentCommand), and most test classesRecordReaderinstantiation viagetRecordReader(FileFormat, ...)PluginManager(uses lowercase string keys like"protobuf")SegmentGenerationAndPushTaskGenerator,FileIngestionTaskConfigUtils,BatchTableSampler,DeltaTableIngestionTaskExecutorThe Problem:
FileFormatenumPROTOPluginManager"protobuf".toLowerCase()RecordReaderFactory"PROTO"(from enum).toUpperCase()When code receives input format as
"protobuf"or"PROTOBUF"(following PluginManager's convention) and tries to convert it toFileFormatviaFileFormat.valueOf(), it fails becauseFileFormatonly hadPROTOas the enum constant.This caused issues in components like
BatchTableSamplerwhere the input format string comes from task configs (using PluginManager's"protobuf"convention) but needs to be converted toFileFormatenum.Solution:
Added
PROTOBUFas an enum constant inFileFormatand registered it inRecordReaderFactoryto ensure consistency withPluginManager's naming convention.Changes:
PROTOBUFtoFileFormatenumPROTOBUFinRecordReaderFactorystatic initializer mapping toProtoBufRecordReaderandProtoBufRecordReaderConfig