Avoid creating LZO indexes on files not spread on several blocs#82
Avoid creating LZO indexes on files not spread on several blocs#82killerwhile wants to merge 1 commit intotwitter:masterfrom
Conversation
|
Actually I was wondering, as it may look strange for user to run DistibutedLzoIndexer resulting in not lzo.index creation if this isn't a feature that should be enable/disable via a parameter (like lzo.skip.useless.indexes=true). WDYT? |
|
Making it configurable sounds better. I wouldn't say it is completely useless (some times you might want to split even a 500 MB file into multiple mappers, out block size is 512MB). Option could be 'lzo.indexer.skip.small.files' |
|
@rangadi don't we already skip index creation somewhere? I know we don't create them for small files (don't recall if small == block size). |
|
|
LZO indexes for files stored in one single block are useless. Simply avoid the creation when the file is smaller than the block size.