Skip to content

Commit f078cd9

Browse files
ds-filipknefelFilip Knefel
andauthored
fix(partition, csv): increase csv field limit (#4046)
Increase the csv field limit to support partitioning of files with large data in fields. --------- Co-authored-by: Filip Knefel <[email protected]>
1 parent 8a9abdd commit f078cd9

File tree

3 files changed

+12
-1
lines changed

3 files changed

+12
-1
lines changed

CHANGELOG.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,12 @@
1+
## 0.18.4
2+
3+
### Enhancements
4+
5+
### Features
6+
7+
### Fixes
8+
- **Increase CSV field limit** Addresses failures in partition for csv files with large fields
9+
110
## 0.18.3
211

312
### Enhancements

unstructured/__version__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.18.3" # pragma: no cover
1+
__version__ = "0.18.4" # pragma: no cover

unstructured/partition/csv.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
from unstructured.utils import is_temp_file_path, lazyproperty
1515

1616
DETECTION_ORIGIN: str = "csv"
17+
CSV_FIELD_LIMIT = 10 * 1048576 # 10MiB
1718

1819

1920
@apply_metadata(FileType.CSV)
@@ -54,6 +55,7 @@ def partition_csv(
5455
infer_table_structure=infer_table_structure,
5556
)
5657

58+
csv.field_size_limit(CSV_FIELD_LIMIT)
5759
with ctx.open() as file:
5860
dataframe = pd.read_csv(file, header=ctx.header, sep=ctx.delimiter, encoding=ctx.encoding)
5961

0 commit comments

Comments
 (0)