-
Notifications
You must be signed in to change notification settings - Fork 4
Description
2026-01-04 02:54:32 | INFO | data_engine.utils.logger_utils:144 - Create logger ID 3 with loglevel: INFO, export to /data/dataflow/arXiv2jsontools4_9b73afdf-f297-431b-ad23-b157a648a50f/output/log/tool_raw_arxiv_to_jsonl_preprocess_internal_time_20260104025432.txt
2026-01-04 02:54:32 | INFO | data_engine.core.executor_tools:52 - Preparing tool...
2026-01-04 02:54:32 | INFO | data_engine.tools.base_tool:44 - Setting up data ingester...
2026-01-04 02:54:32 | INFO | data_engine.ingester.csghub_ingester:30 - Using dataset_path: /data/dataflow/arXiv2jsontools4_9b73afdf-f297-431b-ad23-b157a648a50f/input, repo:longrui/tools4, branch:main
2026-01-04 02:54:32 | INFO | data_engine.tools.base_tool:55 - Preparing exporter...
2026-01-04 02:54:32 | INFO | data_engine.core.executor_tools:59 - Launching tool...
2026-01-04 02:54:32 | INFO | data_engine.ingester.csghub_ingester:41 - model_id:longrui/tools4
2026-01-04 02:54:32 | INFO | data_engine.ingester.csghub_ingester:43 - endpoint:http://modelhub.cmr-co.com
2026-01-04 02:54:32 | INFO | data_engine.ingester.csghub_ingester:44 - 入参:repo_id:longrui/tools4, repo_type:dataset, revision:main, cache_dir:/data/dataflow/arXiv2jsontools4_9b73afdf-f297-431b-ad23-b157a648a50f/input, endpoint:http://modelhub.cmr-co.com, token:b2bc8452d426461d8e4aac51b82fdebc
Downloading .gitattributes: 0%| | 0.00/2.25k [00:00<?, ?B/s]
Downloading .gitattributes: 100%|##########| 2.25k/2.25k [00:00<00:00, 2.76MB/s]
Downloading README.md: 0%| | 0.00/25.0 [00:00<?, ?B/s]
Downloading README.md: 100%|##########| 25.0/25.0 [00:00<00:00, 24.9kB/s]
Downloading open.tar.gz: 0%| | 0.00/1.78M [00:00<?, ?B/s]
Downloading open.tar.gz: 100%|##########| 1.78M/1.78M [00:00<00:00, 61.9MB/s]
2026-01-04 02:54:33 | INFO | data_engine.ingester.csghub_ingester:54 - result: /data/dataflow/arXiv2jsontools4_9b73afdf-f297-431b-ad23-b157a648a50f/input, _src_path: /data/dataflow/arXiv2jsontools4_9b73afdf-f297-431b-ad23-b157a648a50f/input
2026-01-04 02:54:33 | INFO | data_engine.tools.base_tool:95 - Data ingested from /data/dataflow/arXiv2jsontools4_9b73afdf-f297-431b-ad23-b157a648a50f/input
_accelerator 5555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555
2026-01-04 02:54:33 | DEBUG | data_engine.tools.base_tool:137 - Op [raw_arxiv_to_jsonl_preprocess_internal] running with number of procs:3
2026-01-04 02:54:33 | INFO | data_engine.tools.base_tool:109 - Processing tool...
_accelerator -5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5
2026-01-04 02:54:33 | INFO | data_engine.tools.base_tool:114 - Tool are done in 0.128s.
2026-01-04 02:54:33 | INFO | data_engine.tools.base_tool:121 - Exporting dataset to somewhere...
2026-01-04 02:54:33 | INFO | data_engine.exporter.csghub_exporter:94 - The target dir is empty, no need to upload anything, abort.
2026-01-04 02:54:33 | WARNING | data_server.job.JobExecutor:127 - Job 119 still in PROCESSING state in finally block, marking as FAILED
后续数据集附件: