-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Hi, I've been trying out the codebase finetuning models on math and consistently get this error after a couple hundred steps. I get it for the default OpenReasoner data and on NuminaMath-CoT from HuggingFace.
[preprocessor]: 2025-12-17 23:45:18,733 - pipelinerl.utils - ERROR - Exception in preprocess: 'utf-8' codec can't decode byte 0xe2 in position 0: unexpected end of data
[preprocessor]: 2025-12-17 23:45:18,738 - pipelinerl.utils - ERROR - Traceback: Traceback (most recent call last):
File "/home/k/ksareen/PipelineRL/pipelinerl/utils.py", line 317, in better_crashing
yield
File "/home/k/ksareen/PipelineRL/pipelinerl/entrypoints/run_preprocess.py", line 9, in preprocess_hydra_entry_point
run_preprocessing_loop(cfg)
File "/home/k/ksareen/PipelineRL/pipelinerl/preprocess.py", line 510, in run_preprocessing_loop
raise raw_chunk
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 0: unexpected end of data
Do you guys have any advice? So far I have been streaming to the filesystem but I will try again with the redis server. I'm trying this inside the FileStreamReader editing the fix for #113 (comment)
except (json.JSONDecodeError, UnicodeDecodeError) as e: