Skip to content

Commit 96a19d8

Browse files
committed
SAC-29830: Initial Commit with Singer tap generated code
1 parent dd95927 commit 96a19d8

31 files changed

+1233
-146
lines changed

.github/copilot-instructions.md

Lines changed: 184 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
# Instructions for Building a Singer Tap/Target
2+
3+
This document provides guidance for implementing a high-quality Singer Tap (or Target) in compliance with the Singer specification and community best practices. Use it in conjunction with GitHub Copilot or your preferred IDE.
4+
5+
---
6+
7+
## 1. Rate Limiting
8+
9+
- Respect API rate limits (e.g., daily quotas or per-second limits).
10+
- For short-term rate limits, detect HTTP 429 or similar errors and implement retries with sleep/delay.
11+
- Use Singer’s built-in rate-limiting utilities where available.
12+
13+
14+
## 2. Memory Efficiency
15+
16+
- Minimize RAM usage by streaming data.
17+
Example: Use generators or iterators instead of loading entire datasets into memory.
18+
19+
20+
## 3. Consistent Date Handling
21+
22+
- Use RFC 3339 format (including time zone offset). UTC (Z) is preferred.
23+
Examples:
24+
Good: 2017-01-01T00:00:00Z, 2017-01-01T00:00:00-05:00
25+
Bad: 2017-01-01 00:00:00
26+
Use pytz for timezone-aware conversions.
27+
28+
29+
## 4. Logging & Exception Handling
30+
31+
- Log every API request (URL + parameters), omitting sensitive info (e.g., API keys).
32+
- Log progress updates (e.g., “Starting stream X”).
33+
- On API errors, log status code and response body.
34+
35+
For fatal errors:
36+
- Log at CRITICAL or FATAL level.
37+
- Exit with non-zero status.
38+
- Omit stack trace for known, user-triggered conditions.
39+
- Include full trace for unexpected exceptions.
40+
- For recoverable errors, implement retries with exponential backoff (e.g., using the backoff library).
41+
42+
43+
## 5. Module Structure
44+
45+
- Organize code into a proper Python module (directory with __init__.py), not a single script file.
46+
47+
48+
## 6. Schema Management
49+
50+
- For static schemas, store them as .json files in a schemas/ directory—not as inline Python dicts.
51+
Prefer explicit schemas:
52+
- Avoid additionalProperties: true or vague typing.
53+
- Use clear field names and types.
54+
- Set additionalProperties: false when schemas must be strict.
55+
- Be cautious when tightening schemas in new versions—it may require a major version bump per semantic versioning.
56+
57+
58+
## 7. JSON Schema Guidelines
59+
60+
- All files under schemas/*.json must follow the JSON Schema standard.
61+
- Any fields named created_time, modified_time, ending in _time or ending in _date must use the date-time format.
62+
- Any fields looks like date-time field, give suggestion to validate the fields should have date-time format.
63+
- Avoid using additionalProperties at the root level. It's allowed in nested fields only.
64+
65+
Example:
66+
{
67+
"type": "object",
68+
"properties": {
69+
"created_time": {
70+
"type": ["null", "string"],
71+
"format": "date-time"
72+
},
73+
"last_access_time": {
74+
"type": ["null", "string"],
75+
"format": "date-time"
76+
}
77+
}
78+
}
79+
80+
81+
## 8. Validating Bookmarking
82+
83+
We use the singer.bookmarks module to read from and write to the bookmark state file.
84+
To ensure correctness, always validate the structure of the bookmark state before processing or committing any changes.
85+
- In abstract.py, we use get_bookmark() and write_bookmark() to manage bookmarks for streams.
86+
- The write_bookmark() function overrides the one from the singer module to apply custom behavior.
87+
- Always confirm that the state structure matches the expected format before writing.
88+
89+
Format Example:
90+
{
91+
"bookmarks": {
92+
"stream_name": {
93+
"replication_key": "2024-01-01T00:00:00Z"
94+
}
95+
}
96+
}
97+
98+
99+
Optional validation function:
100+
def is_valid_bookmark_state(state):
101+
return isinstance(state, dict) and \
102+
"bookmarks" in state and \
103+
isinstance(state["bookmarks"], dict)
104+
105+
106+
## 9. Code Quality
107+
108+
- Use pylint and aim for zero error-level messages.
109+
- CI pipelines (e.g., CircleCI) should enforce linting.
110+
- Fix or explicitly disable warnings when appropriate.
111+
112+
113+
## 10. Loop Safety
114+
115+
- **Avoid `while True` loops.** Use explicit conditions instead (e.g., `while has_more_pages`).
116+
- Every loop must have a clear exit condition that will eventually be satisfied.
117+
- For pagination loops, ensure there's a termination condition (e.g., no next page, empty results, max iterations).
118+
- Add safeguards like maximum iteration counts or timeouts for loops that depend on external API responses.
119+
120+
Good example (explicit condition):
121+
```python
122+
max_pages = 1000
123+
current_page = 1
124+
has_more_pages = True
125+
126+
while has_more_pages and current_page <= max_pages:
127+
response = fetch_page(current_page)
128+
if not response or not response.get('data'):
129+
has_more_pages = False
130+
break
131+
132+
process_data(response['data'])
133+
134+
next_page = response.get('next_page')
135+
if not next_page:
136+
has_more_pages = False
137+
else:
138+
current_page += 1
139+
```
140+
141+
Bad example (using while True):
142+
```python
143+
while True:
144+
response = fetch_page()
145+
if not response:
146+
break # Avoid this pattern
147+
process_data(response)
148+
```
149+
150+
151+
## 11. Record Completeness
152+
153+
- **Never skip records during sync unless absolutely necessary.**
154+
- If a record encounters an error during transformation or validation, log the error with full context but do not silently skip it.
155+
- Only skip records if:
156+
- They fail schema validation and cannot be processed (log as WARNING or ERROR).
157+
- They are explicitly filtered by business logic (e.g., replication key filtering).
158+
- **Never use `continue` to skip records on errors without logging.**
159+
160+
Good example (log and handle errors):
161+
```python
162+
for record in get_records():
163+
try:
164+
transformed = transformer.transform(record, schema, metadata)
165+
write_record(stream_name, transformed)
166+
counter.increment()
167+
except Exception as e:
168+
LOGGER.error(f"Failed to transform record {record.get('id')}: {e}")
169+
# Re-raise or handle based on severity
170+
raise
171+
```
172+
173+
Bad example (silently skipping records):
174+
```python
175+
for record in get_records():
176+
try:
177+
transformed = transformer.transform(record, schema, metadata)
178+
write_record(stream_name, transformed)
179+
except Exception:
180+
continue # Silently skips - BAD!
181+
```
182+
183+
- For incremental streams, ensure bookmark filtering logic is correct to avoid data loss.
184+
- If a record must be skipped, document why in the code and log it clearly.

.pre-commit-config.yaml

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
default_stages: [commit]
2+
repos:
3+
- repo: https://github.com/pre-commit/pre-commit-hooks
4+
rev: v5.0.0
5+
hooks:
6+
- id: check-merge-conflict
7+
- id: check-docstring-first
8+
- id: debug-statements
9+
- id: trailing-whitespace
10+
- id: check-toml
11+
- id: end-of-file-fixer
12+
- id: check-yaml
13+
- id: sort-simple-yaml
14+
- id: check-json
15+
- id: pretty-format-json
16+
args: ['--autofix','--no-sort-keys']
17+
18+
- repo: https://github.com/psf/black
19+
rev: 23.12.0
20+
hooks:
21+
- id: black
22+
23+
- repo: https://github.com/pycqa/flake8
24+
rev: 7.1.2
25+
hooks:
26+
- id: flake8
27+
args: ["--ignore=W503,E501,C901"]
28+
additional_dependencies: [
29+
'flake8-print',
30+
'flake8-debugger',
31+
]
32+
33+
- repo: https://github.com/PyCQA/bandit
34+
rev: '1.7.10'
35+
hooks:
36+
- id: bandit
37+
38+
- repo: https://github.com/PyCQA/docformatter
39+
rev: v1.7.5
40+
hooks:
41+
- id: docformatter
42+
args: [--in-place]
43+
44+
- repo: https://github.com/codespell-project/codespell
45+
rev: v2.4.1
46+
hooks:
47+
- id: codespell

spike/tap-trello-config.json

Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
{
2+
"tap_name": "Trello",
3+
"required_config_keys": ["consumer_key", "consumer_secret", "access_token", "access_token_secret", "start_date"],
4+
"page_size": 1000,
5+
"next_page_key": "page",
6+
"pagination_key": "page",
7+
"headers": {"Accept": "application/json"},
8+
"params": {},
9+
"auth_header_key": "oauth1",
10+
"auth_config_key": {
11+
"consumer_key": "consumer_key",
12+
"consumer_secret": "consumer_secret",
13+
"access_token": "access_token",
14+
"access_token_secret": "access_token_secret"
15+
},
16+
"base_url": "https://api.trello.com/1",
17+
"streams": [
18+
{
19+
"name": "board_memberships",
20+
"key_properties": ["id"],
21+
"replication_method": "FULL_TABLE",
22+
"replication_keys": null,
23+
"data_key": null,
24+
"path": "/boards/{id}/memberships",
25+
"parent": "boards",
26+
"children": [],
27+
"doc_link": "https://developer.atlassian.com/cloud/trello/rest/api-group-boards/#api-boards-id-memberships-get"
28+
},
29+
{
30+
"name": "board_custom_fields",
31+
"key_properties": ["id"],
32+
"replication_method": "FULL_TABLE",
33+
"replication_keys": null,
34+
"data_key": null,
35+
"path": "/boards/{id}/customFields",
36+
"parent": null,
37+
"children": [],
38+
"doc_link": "https://developer.atlassian.com/cloud/trello/rest/api-group-boards/#api-boards-id-customfields-get"
39+
},
40+
{
41+
"name": "board_labels",
42+
"key_properties": ["id"],
43+
"replication_method": "FULL_TABLE",
44+
"replication_keys": null,
45+
"data_key": null,
46+
"path": "/boards/{id}/labels",
47+
"parent": "boards",
48+
"children": [],
49+
"doc_link": "https://developer.atlassian.com/cloud/trello/rest/api-group-boards/#api-boards-id-labels-get"
50+
},
51+
{
52+
"name": "card_attachments",
53+
"key_properties": ["id"],
54+
"replication_method": "FULL_TABLE",
55+
"replication_keys": null,
56+
"data_key": null,
57+
"path": "/cards/{id}/attachments",
58+
"parent": "cards",
59+
"children": [],
60+
"doc_link": "https://developer.atlassian.com/cloud/trello/rest/api-group-cards/#api-cards-id-attachments-get"
61+
},
62+
{
63+
"name": "card_custom_field_items",
64+
"key_properties": ["id"],
65+
"replication_method": "FULL_TABLE",
66+
"replication_keys": null,
67+
"data_key": null,
68+
"path": "/cards/{id}/customFieldItems",
69+
"parent": "cards",
70+
"children": [],
71+
"doc_link": "https://developer.atlassian.com/cloud/trello/rest/api-group-cards/#api-cards-id-customfielditems-get"
72+
},
73+
{
74+
"name": "members",
75+
"key_properties": ["id"],
76+
"replication_method": "FULL_TABLE",
77+
"replication_keys": null,
78+
"data_key": null,
79+
"path": "/members/{id}",
80+
"parent": "boards",
81+
"children": [],
82+
"doc_link": "https://developer.atlassian.com/cloud/trello/rest/api-group-members/#api-members-id-get"
83+
},
84+
{
85+
"name": "organizations",
86+
"key_properties": ["id"],
87+
"replication_method": "FULL_TABLE",
88+
"replication_keys": null,
89+
"data_key": null,
90+
"path": "/organizations/{id}",
91+
"parent": null,
92+
"children": [],
93+
"doc_link": "https://developer.atlassian.com/cloud/trello/rest/api-group-organizations/#api-organizations-id-get"
94+
},
95+
{
96+
"name": "organization_actions",
97+
"key_properties": ["id"],
98+
"replication_method": "INCREMENTAL",
99+
"replication_keys": [
100+
"date"
101+
],
102+
"data_key": null,
103+
"path": "/organizations/{id}/actions",
104+
"parent": "organizations",
105+
"children": [],
106+
"doc_link": "https://developer.atlassian.com/cloud/trello/rest/api-group-organizations/#api-organizations-id-actions-get"
107+
},
108+
{
109+
"name": "organization_members",
110+
"key_properties": ["id"],
111+
"replication_method": "FULL_TABLE",
112+
"replication_keys": null,
113+
"data_key": null,
114+
"path": "/organizations/{id}/members",
115+
"parent": "organizations",
116+
"children": [],
117+
"doc_link": "https://developer.atlassian.com/cloud/trello/rest/api-group-organizations/#api-organizations-id-members-get"
118+
},
119+
{
120+
"name": "organization_memberships",
121+
"key_properties": ["id"],
122+
"replication_method": "FULL_TABLE",
123+
"replication_keys": null,
124+
"data_key": null,
125+
"path": "/organizations/{id}/memberships",
126+
"parent": "organizations",
127+
"children": [],
128+
"doc_link": "https://developer.atlassian.com/cloud/trello/rest/api-group-organizations/#api-organizations-id-memberships-get"
129+
}
130+
],
131+
"tap_tester_creds": {
132+
"consumer_key": "TAP_TRELLO_CONSUMER_KEY",
133+
"consumer_secret": "TAP_TRELLO_CONSUMER_SECRET",
134+
"access_token": "TAP_TRELLO_ACCESS_TOKEN",
135+
"access_token_secret": "TAP_TRELLO_ACCESS_TOKEN_SECRET"
136+
},
137+
"author": "Stitch",
138+
"third_party_dependencies": [
139+
"singer-python==6.3.0",
140+
"requests==2.32.5",
141+
"backoff==2.2.1"
142+
]
143+
}

0 commit comments

Comments
 (0)