Skip to content

On how to contribute to your project. #755

@roland-schwarz-enovos

Description

@roland-schwarz-enovos

Dear developers / project owners,

Upfront, this is not an issue, more a kind of request how I can commit to your project.
I've created this account with my company mail address, we are in the south of Luxembourg and are in the energy business.
We are currently working on improving the data quality along the flow of data items through our system ( This might sound a bit like buzzword bingo ;)).
We have different data sources with different format types, csv, json, csv disguised as excel and some mythical xmls. They all arrive in S3 or on some ftp infrastructure.
We committed on using ODCS for our contracts that model the expected format and quality on each step of the process.
We've done some case tests with the datacontract-cli framework, via python, integrated it into our airflow infrastructure, ran tests and created some notifications out of the results.

I've encountered some effects while using the framework and validating files, two of them are:

  • Translation from ODCS to DCS Format - not all keywords are translated.
    Example: We have this definition for a field:
    logicalTypeOptions:
    minLength: 36
    maxLength: 36
    pattern: '^[0-9a-fA-F]{8}-(?:[0-9a-fA-F]{4}-){3}[0-9a-fA-F]{12}$'
    Those elements are not yet handled in the odcs_v3_importer ( don't get me wrong, thats no criticism ;) ).
    I've implemented three lines that take care of this and work in our environment, would be a shame to keep them for myself only.

  • Executing tests on csv-files with some dirt, like # as comment symbols.
    Reading of csv is handled through duckdb, the "autosensing" of the format fails if there is a comment present in the file.
    Duckdb is capable to handle this throuh some options in the read_csv function, but in this case something goes wrong, I haven't figured it out yet.

I wanted to keep it short, didn't work that well. So to get to the point - I would like to contribute to your project and coordinate with you how to do this without disrupting any of your processes ;).
I'm also quite new to github and have not worked on any projects yet, I would unterstand that I just can't commit any of my changes to the main branch.
In case you would like to reach me through mail, I've provided my company address in my profile. German language is fine with me ;).
Thank you!

Greetings,
Roland

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions