Skip to content

frictionless transform unhandled exception on 200 line csv, but not 197 line csv #1622

@richardt-engineb

Description

@richardt-engineb

Overview

I am finding a very strange error when doing a transfrom (either in python code or via the command line tool). Depending on the size of the input file the transform succeeds fine, or throws an "I/O operation on closed file" exception. The number of lines required to trigger it seems to vary, even by execution environment.

On a M1 Mac Mini it's currently 198 lines crashes, 197 lines passes. On a gitpod instance (Ubuntu), it was around the same yesterday, but today is more like 150. In our code version it can take 10k lines+. But there is always a size above which this fails (and a size far short of e.g. settings.FIELD_SIZE_LIMIT).

Example Command Line

% frictionless transform data/crash-transform/data.csv --pipeline data/crash-transform/pipeline.json
╭─ Error ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ [step-error] Step is not valid: "cell_replace" raises "[step-error] Step is not valid: "table_normalize" raises "[source-error] The data source has not supported or has         │
│ inconsistent contents: I/O operation on closed file. " "                                                                                                                         │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Pipeline.json

{
  "steps": [
    { "type": "table-normalize" },
    { "type": "cell-replace", "pattern": "BLANK", "replace": "" },
    { "type": "cell-replace", "pattern": "blank", "replace": "" },
    { "type": "cell-replace", "pattern": "NULL", "replace": "" },
    { "type": "cell-replace", "pattern": "null", "replace": "" },
    {
      "name": "NewSumField",
      "type": "field-add",
      "formula": "Field1 + Field2"
    },
    { "name": "NewConstantField", "type": "field-add", "value": "NewValue" }
  ]
}

data.csv

Field1,Field2,Random1,Random2,Random3,Random4,Random5,Random6,Random7,Random8,Random9
0,0,BLANK,blank,NULL,null,5val0,6val0,7val0,8val0,9val0
1,10,1val1,2val1,3val1,4val1,5val1,6val1,7val1,8val1,9val1
2,20,1val2,2val2,3val2,4val2,5val2,6val2,7val2,8val2,9val2
3,30,BLANK,2val3,3val3,4val3,5val3,6val3,7val3,8val3,9val3
... extend as needed ...

Sample files

data.csv
pipeline.json

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions