-
Couldn't load subscription status.
- Fork 128
Inconsistent usage of table_name vs tap_stream_id #98
Description
Background:
- I am working on a source (dynamodb) whose upstream table names can contain special characters, and in our case they contain dashes which are parsed in a special and significant way by this target.
- In the effort to provide isolation between upstream and downstream table name, I research into the spec and found that (according to my interpretation of the spec here)
table_nameis intended to describe the upstream source andtap_stream_idis intended to drive downstream behavior.
I have created singer-io/tap-dynamodb#25 to resolve this on the tap side.
Problem:
When sending events now through that tap, it appears that there is inconsistency on when this target uses table_name and when it uses tap_stream_id. (Again, according to my understanding of the spec here, table_name should be used by the tap and tap_stream_id should govern naming in the target.)
Log below comes from a single table sync operation. Note that first it uses the correct table name, and second it uses the table name "TABLE", which is likely coming from parsing the table_name instead of tap_stream_id.
I'm planning to submit a PR but first wanted to post this to create awareness and promote discussion.
Thanks!
Here's the full log...
Note that the upstream table_name in this examples is dev_mes-employeeAssessment-table and tap_stream_id of employeeAssessment. The target checks first if employeeAssessment exists and then (mistakenly) if TABLE exists.
2020-09-04 16:06:16,334 - INFO - Beginning running command: tap-dynamodb --config /mnt/c/Files/Source/slalom-data-platform-core/data/taps/.secrets/tmp/tap-me-slalom-config.json --catalog ./.output/taps/me-slalom-catalog/me-slalom-employeeAssessment-catalog.json --state /tmp/tmpa9t8bkaj/me-slalom-employeeAssessment-state.json | target-snowflake --config /mnt/c/Files/Source/slalom-data-platform-core/data/taps/.secrets/tmp/target-snowflake-config-employeeAssessment.json > /tmp/tmpa9t8bkaj/me-slalom-employeeAssessment-state-new.json...
INFO Found credentials in shared credentials file: /mnt/c/Files/Source/slalom-data-platform-core/infra/dev/.secrets/aws-credentials
INFO Attempting to assume_role on RoleArn: arn:aws:iam::489003720472:role/TEST-AJ-DynamoDB-SingerExtracts-Role
INFO Starting sync.
INFO employeeAssessment: Starting sync
INFO Syncing full table for stream: dev_mes-employeeAssessment-table
INFO Scanning table dev_mes-employeeAssessment-table with params:
INFO TableName = dev_mes-employeeAssessment-table
INFO Limit = 1000
INFO employeeAssessment: Completed sync (17 rows)
INFO
+Sync Summary--------+--------------------+---------------+---------------------+
| table name | replication method | total records | write speed |
+--------------------+--------------------+---------------+---------------------+
| employeeAssessment | FULL_TABLE | 17 records | 19.3 records/second |
+--------------------+--------------------+---------------+---------------------+
INFO Done syncing.
time=2020-09-04 16:06:18 name=target_snowflake level=INFO message=Getting catalog objects from table cache...
time=2020-09-04 16:06:20 name=target_snowflake level=INFO message=Table 'RAW_MES."EMPLOYEEASSESSMENT"' does not exist. Creating...
time=2020-09-04 16:06:23 name=target_snowflake level=INFO message=Table 'RAW_MES."TABLE"' exists
time=2020-09-04 16:06:23 name=target_snowflake level=INFO message=Uploading 17 rows to external snowflake stage on S3
time=2020-09-04 16:06:23 name=target_snowflake level=INFO message=Target S3 bucket: dataplatformtest01-data-44635, local file: /tmp/records_2ph8vxjw.csv.gz, S3
key: data/raw/me-slalom/employeeAssessment/v1/pipelinewise_dev_mes-employeeAssessment-table_20200904-160623-191772.csv.gz
time=2020-09-04 16:06:24 name=target_snowflake level=INFO message=Loading 17 rows into 'RAW_MES."TABLE"'
time=2020-09-04 16:06:25 name=target_snowflake level=INFO message=Loading into RAW_MES."TABLE": {"inserts": 0, "updates": 17, "size_bytes": 119}
time=2020-09-04 16:06:25 name=target_snowflake level=INFO message=Emitting state {"bookmarks": {"employeeAssessment": {"last_replication_method": "FULL_TABLE"}, "dev_mes-employeeAssessment-table": {"version": 1599260777218, "initial_full_table_complete": true, "success_timestamp": "2020-09-04T23:06:18.099907Z"}}, "currently_syncing": "dev_mes-employeeAssessment-table"}