Skip to content

feat: parquet datasource #320

@adrienaury

Description

@adrienaury

Add a new datasource type supported by lino : parquet

The initial implementation supports :

  • no ingress description definition, by default lino pull will follow the parquet schema
  • a single table by file, with name "data" by convention
  • read only, can't update existing file (lino push will overwrite existing file)

Use the provided example file to test parquet features : test.zip

lino dataconnector

$ lino dc add myfile 'parquet://path/to/test.parquet'
successfully added dataconnector
$ cat dataconnector.yaml
version: v1
dataconnectors:
  - name: myfile
    url: parquet://path/to/test.parquet
    readonly: false

lino table

$ lino table extract myfile
lino finds 1 table
$ cat tables.yaml
version: v1
tables:
  - name: data
    columns:
      - name: col_type_binary_string
      - name: col_type_int_32
      - name: col_type_boolean

lino pull

$ lino pull myfile
{"col_type_binary_string": "6FBFXK4V82XQA4EZ0UCY8PZZN2B256", "col_type_int_32": 324442164, "col_type_boolean": true}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions