Skip to content
This repository was archived by the owner on Sep 23, 2024. It is now read-only.

Inconsistent usage of table_name vs tap_stream_id #98

@aaronsteers

Description

@aaronsteers

Background:

  • I am working on a source (dynamodb) whose upstream table names can contain special characters, and in our case they contain dashes which are parsed in a special and significant way by this target.
  • In the effort to provide isolation between upstream and downstream table name, I research into the spec and found that (according to my interpretation of the spec here) table_name is intended to describe the upstream source and tap_stream_id is intended to drive downstream behavior.

I have created singer-io/tap-dynamodb#25 to resolve this on the tap side.

Problem:

When sending events now through that tap, it appears that there is inconsistency on when this target uses table_name and when it uses tap_stream_id. (Again, according to my understanding of the spec here, table_name should be used by the tap and tap_stream_id should govern naming in the target.)

Log below comes from a single table sync operation. Note that first it uses the correct table name, and second it uses the table name "TABLE", which is likely coming from parsing the table_name instead of tap_stream_id.

I'm planning to submit a PR but first wanted to post this to create awareness and promote discussion.

Thanks!


Here's the full log...

Note that the upstream table_name in this examples is dev_mes-employeeAssessment-table and tap_stream_id of employeeAssessment. The target checks first if employeeAssessment exists and then (mistakenly) if TABLE exists.

2020-09-04 16:06:16,334 - INFO - Beginning running command: tap-dynamodb --config /mnt/c/Files/Source/slalom-data-platform-core/data/taps/.secrets/tmp/tap-me-slalom-config.json --catalog ./.output/taps/me-slalom-catalog/me-slalom-employeeAssessment-catalog.json --state /tmp/tmpa9t8bkaj/me-slalom-employeeAssessment-state.json | target-snowflake --config /mnt/c/Files/Source/slalom-data-platform-core/data/taps/.secrets/tmp/target-snowflake-config-employeeAssessment.json > /tmp/tmpa9t8bkaj/me-slalom-employeeAssessment-state-new.json...
INFO Found credentials in shared credentials file: /mnt/c/Files/Source/slalom-data-platform-core/infra/dev/.secrets/aws-credentials
INFO Attempting to assume_role on RoleArn: arn:aws:iam::489003720472:role/TEST-AJ-DynamoDB-SingerExtracts-Role
INFO Starting sync.
INFO employeeAssessment: Starting sync
INFO Syncing full table for stream: dev_mes-employeeAssessment-table
INFO Scanning table dev_mes-employeeAssessment-table with params:
INFO    TableName = dev_mes-employeeAssessment-table
INFO    Limit = 1000
INFO employeeAssessment: Completed sync (17 rows)
INFO
+Sync Summary--------+--------------------+---------------+---------------------+
| table name         | replication method | total records | write speed         |
+--------------------+--------------------+---------------+---------------------+
| employeeAssessment | FULL_TABLE         | 17 records    | 19.3 records/second |
+--------------------+--------------------+---------------+---------------------+
INFO Done syncing.
time=2020-09-04 16:06:18 name=target_snowflake level=INFO message=Getting catalog objects from table cache...
time=2020-09-04 16:06:20 name=target_snowflake level=INFO message=Table 'RAW_MES."EMPLOYEEASSESSMENT"' does not exist. Creating...
time=2020-09-04 16:06:23 name=target_snowflake level=INFO message=Table 'RAW_MES."TABLE"' exists
time=2020-09-04 16:06:23 name=target_snowflake level=INFO message=Uploading 17 rows to external snowflake stage on S3
time=2020-09-04 16:06:23 name=target_snowflake level=INFO message=Target S3 bucket: dataplatformtest01-data-44635, local file: /tmp/records_2ph8vxjw.csv.gz, S3 
key: data/raw/me-slalom/employeeAssessment/v1/pipelinewise_dev_mes-employeeAssessment-table_20200904-160623-191772.csv.gz
time=2020-09-04 16:06:24 name=target_snowflake level=INFO message=Loading 17 rows into 'RAW_MES."TABLE"'
time=2020-09-04 16:06:25 name=target_snowflake level=INFO message=Loading into RAW_MES."TABLE": {"inserts": 0, "updates": 17, "size_bytes": 119}
time=2020-09-04 16:06:25 name=target_snowflake level=INFO message=Emitting state {"bookmarks": {"employeeAssessment": {"last_replication_method": "FULL_TABLE"}, "dev_mes-employeeAssessment-table": {"version": 1599260777218, "initial_full_table_complete": true, "success_timestamp": "2020-09-04T23:06:18.099907Z"}}, "currently_syncing": "dev_mes-employeeAssessment-table"}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions