Skip to content

Commit 41b4aee

Browse files
committed
🎨 Auto-format code with pre-commit
1 parent c93f3cb commit 41b4aee

File tree

4 files changed

+11
-14
lines changed

4 files changed

+11
-14
lines changed

dingo/config/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
1-
from dingo.config.input_args import (DatasetArgs, DatasetCsvArgs, DatasetExcelArgs, DatasetFieldArgs, DatasetHFConfigArgs, DatasetParquetArgs, DatasetS3ConfigArgs, DatasetSqlArgs, EvalPipline, # noqa E402.
2-
EvalPiplineConfig, EvaluatorLLMArgs, EvaluatorRuleArgs, ExecutorArgs, ExecutorResultSaveArgs, InputArgs)
1+
from dingo.config.input_args import (DatasetArgs, DatasetCsvArgs, DatasetExcelArgs, DatasetFieldArgs, DatasetHFConfigArgs, DatasetParquetArgs, DatasetS3ConfigArgs, DatasetSqlArgs, # noqa E402.
2+
EvalPipline, EvalPiplineConfig, EvaluatorLLMArgs, EvaluatorRuleArgs, ExecutorArgs, ExecutorResultSaveArgs, InputArgs)

docs/dataset/parquet.md

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,12 @@ Dingo 现已支持 Parquet 文件的流式读取,提供高效的列式数据
66

77
## 主要特性
88

9-
**流式读取** - 使用 PyArrow 引擎,分批次处理,适合大文件
10-
**列式存储** - 支持只读取指定列,大幅减少内存占用
11-
**高性能** - 基于 Apache Arrow,读取速度快
12-
**批次控制** - 可自定义批次大小,平衡性能和内存
13-
**类型丰富** - 支持多种数据类型(int、float、bool、string、None 等)
14-
**压缩支持** - 支持 Snappy、Gzip、LZ4 等压缩格式
9+
**流式读取** - 使用 PyArrow 引擎,分批次处理,适合大文件
10+
**列式存储** - 支持只读取指定列,大幅减少内存占用
11+
**高性能** - 基于 Apache Arrow,读取速度快
12+
**批次控制** - 可自定义批次大小,平衡性能和内存
13+
**类型丰富** - 支持多种数据类型(int、float、bool、string、None 等)
14+
**压缩支持** - 支持 Snappy、Gzip、LZ4 等压缩格式
1515

1616
## 配置参数
1717

@@ -132,7 +132,7 @@ Dingo 读取 Parquet 文件后,会将每行数据转换为 JSON 格式:
132132
只从磁盘读取:
133133
列 id: [1, 2]
134134
列 name: [张三, 李四]
135-
135+
136136
跳过读取:age、city 列(节省 I/O 和内存)
137137
```
138138

@@ -292,7 +292,7 @@ pip install pyarrow
292292
```
293293
ImportError: No module named 'pyarrow'
294294
```
295-
**解决方案:**
295+
**解决方案:**
296296
```bash
297297
pip install pyarrow
298298
```
@@ -420,4 +420,3 @@ input_data = {
420420
- [Excel 读取文档](excel.md)
421421
- [数据集配置文档](../config.md)
422422
- [评估器配置文档](../rules.md)
423-

examples/dataset/example_parquet.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,4 +38,3 @@
3838
executor = Executor.exec_map["local"](input_args)
3939
result = executor.execute()
4040
print(result)
41-

test/scripts/dataset/test_parquet_dataset.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,8 +44,8 @@ def create_test_parquet_file(file_path: str, num_rows: int = 100):
4444
def create_test_parquet_with_special_types(file_path: str):
4545
"""创建包含特殊类型的测试 Parquet 文件"""
4646
try:
47-
import pandas as pd
4847
import numpy as np
48+
import pandas as pd
4949
except ImportError:
5050
print("⚠ pandas 或 numpy 未安装")
5151
return False
@@ -506,4 +506,3 @@ def test_parquet_comprehensive():
506506
print("║" + " " * 18 + "所有测试完成!" + " " * 23 + "║")
507507
print("╚" + "═" * 58 + "╝")
508508
print("\n")
509-

0 commit comments

Comments
 (0)