-
Notifications
You must be signed in to change notification settings - Fork 11
Description
I've got json files taken from the github API as my data. I want to define a scheme to test the raw data against on its arrival. When I try to follow the steps in https://github.com/moj-analytical-services/etl_pipeline_example it doesn't really tell me how to create the schema in the first place.
When I try to use DatabaseMeta and TableMeta to define the schema I get
tab.add_column(
name='authoredByCommitter',
type='boolean',
description='whether the author and committer are the same'
)
I get
ValueError: string provided must be lowercase
The issue being that by data comes out of Github's API with the field names in camel case. Obviously I could work through and rename them, but ought I to have to do this? If the data was in csv I could probably drop all of the headers and redefine them (?) but for a json with its nested structure this is potentially even more of a pain in the arse.