This document provides guidance for AI coding agents and developers working on this Singer tap.
- Project Type: Singer Tap
- Source: RestCountries
- Stream Type: REST
- Authentication: Custom or N/A
- Framework: Meltano Singer SDK
This tap follows the Singer specification and uses the Meltano Singer SDK to extract data from RestCountries.
- Tap Class (
tap_restcountries/tap.py): Main entry point, defines streams and configuration - Client (
tap_restcountries/client.py): Handles API communication and authentication - Streams (
tap_restcountries/streams.py): Define data streams and their schemas
Before making changes, ensure you understand these Singer concepts:
- Streams: Individual data endpoints (e.g., users, orders, transactions)
- State: Tracks incremental sync progress using bookmarks
- Catalog: Metadata about available streams and their schemas
- Records: Individual data items emitted by the tap
- Schemas: JSON Schema definitions for stream data
- Define stream class in
tap_restcountries/streams.py - Set
name,path,primary_keys, andreplication_key(set this toNoneif not applicable) - Define schema using
PropertiesListor JSON Schema - Register stream in the tap's
discover_streams()method
Example:
class MyNewStream(RestCountriesStream):
name = "my_new_stream"
path = "/api/v1/my_resource"
primary_keys = ["id"]
replication_key = "updated_at"
schema = PropertiesList(
Property("id", StringType, required=True),
Property("name", StringType),
Property("updated_at", DateTimeType),
).to_dict()The SDK provides built-in pagination classes. Use these instead of overriding get_next_page_token() directly.
Built-in Paginator Classes:
-
SimpleHeaderPaginator: For APIs using Link headers (RFC 5988)
from singer_sdk.pagination import SimpleHeaderPaginator class MyStream(RestCountriesStream): def get_new_paginator(self): return SimpleHeaderPaginator()
-
HeaderLinkPaginator: For APIs with
Link: <url>; rel="next"headersfrom singer_sdk.pagination import HeaderLinkPaginator class MyStream(RestCountriesStream): def get_new_paginator(self): return HeaderLinkPaginator()
-
JSONPathPaginator: For cursor/token in response body
from singer_sdk.pagination import JSONPathPaginator class MyStream(RestCountriesStream): def get_new_paginator(self): return JSONPathPaginator("$.pagination.next_token")
-
SinglePagePaginator: For non-paginated endpoints
from singer_sdk.pagination import SinglePagePaginator class MyStream(RestCountriesStream): def get_new_paginator(self): return SinglePagePaginator()
Creating Custom Paginators:
For complex pagination logic, create a custom paginator class:
from singer_sdk.pagination import BasePageNumberPaginator
class MyCustomPaginator(BasePageNumberPaginator):
def has_more(self, response):
"""Check if there are more pages."""
data = response.json()
return data.get("has_more", False)
def get_next_url(self, response):
"""Get the next page URL."""
data = response.json()
if self.has_more(response):
return data.get("next_url")
return None
# Use in stream
class MyStream(RestCountriesStream):
def get_new_paginator(self):
return MyCustomPaginator(start_value=1)Common Pagination Patterns:
- Offset-based: Extend
BaseOffsetPaginator - Page-based: Extend
BasePageNumberPaginator - Cursor-based: Extend
BaseAPIPaginatorwith custom logic - HATEOAS/HAL: Use
JSONPathPaginatorwith appropriate JSON path
Only override get_next_page_token() as a last resort for very simple cases.
- Set
replication_keyto enable incremental sync (e.g., "updated_at") - Override
get_starting_timestamp()to set initial sync point - State automatically managed by SDK
- Access current state via
get_context_state()
- Use flexible schemas during development
- Add new properties without breaking changes
- Consider making fields optional when unsure
- Use
th.Property("field", th.StringType)for basic types - Nest objects with
th.ObjectType(...)
Run tests to verify your changes:
# Install dependencies
uv sync
# Run all tests
uv run pytest
# Run specific test
uv run pytest tests/test_core.py -k test_nameConfiguration properties are defined in the tap class:
- Required vs optional properties
- Secret properties (passwords, tokens)
- Mark sensitive data with
secret=Trueparameter - Defaults specified in config schema
Example configuration schema:
from singer_sdk import typing as th
config_jsonschema = th.PropertiesList(
th.Property("api_url", th.StringType, required=True),
th.Property("api_key", th.StringType, required=True, secret=True),
th.Property("start_date", th.DateTimeType),
th.Property("user_agent", th.StringType, default="tap-mysource"),
).to_dict()Example test with config:
tap-restcountries --config config.json --discover
tap-restcountries --config config.json --catalog catalog.jsonWhen this tap is used with Meltano, the settings defined in meltano.yml must stay in sync with the config_jsonschema in the tap class. Configuration drift between these two sources causes confusion and runtime errors.
When to sync:
- Adding new configuration properties to the tap
- Removing or renaming existing properties
- Changing property types, defaults, or descriptions
- Marking properties as required or secret
How to sync:
- Update
config_jsonschemaintap_restcountries/tap.py - Update the corresponding
settingsblock inmeltano.yml - Update
.env.examplewith the new environment variable
Example - adding a new batch_size setting:
# tap_restcountries/tap.py
config_jsonschema = th.PropertiesList(
th.Property("api_url", th.StringType, required=True),
th.Property("api_key", th.StringType, required=True, secret=True),
th.Property("batch_size", th.IntegerType, default=100), # New setting
).to_dict()# meltano.yml
plugins:
extractors:
- name: tap-restcountries
settings:
- name: api_url
kind: string
- name: api_key
kind: string
sensitive: true
- name: batch_size # New setting
kind: integer
value: 100# .env.example
TAP_RESTCOUNTRIES_API_URL=https://api.example.com
TAP_RESTCOUNTRIES_API_KEY=your_api_key_here
TAP_RESTCOUNTRIES_BATCH_SIZE=100 # New settingSetting kind mappings:
| Python Type | Meltano Kind |
|---|---|
StringType |
string |
IntegerType |
integer |
BooleanType |
boolean |
NumberType |
number |
DateTimeType |
date_iso8601 |
ArrayType |
array |
ObjectType |
object |
Any properties with secret=True should be marked with sensitive: true in meltano.yml.
Best practices:
- Always update all three files (
tap.py,meltano.yml,.env.example) in the same commit - Use the same default values in all locations
- Keep descriptions consistent between code docstrings and
meltano.ymldescriptionfields
Note: This guidance is consistent with target and mapper templates in the Singer SDK. See the SDK documentation for canonical reference.
- Rate Limiting: Implement backoff using
RESTStreambuilt-in retry logic - Large Responses: Use pagination, don't load entire dataset into memory
- Schema Mismatches: Validate data matches schema, handle null values
- State Management: Don't modify state directly, use SDK methods
- Timezone Handling: Use UTC, parse ISO 8601 datetime strings
- Error Handling: Let SDK handle retries, log warnings for data issues
- Logging: Use
self.loggerfor structured logging - Validation: Validate API responses before emitting records
- Documentation: Update README with new streams and config options
- Type Hints: Add type hints to improve code clarity
- Testing: Write tests for new streams and edge cases
- Performance: Profile slow streams, optimize API calls
- Error Messages: Provide clear, actionable error messages
tap-restcountries/
├── tap_restcountries/
│ ├── __init__.py
│ ├── tap.py # Main tap class
│ ├── client.py # API client
│ └── streams.py # Stream definitions
├── tests/
│ ├── __init__.py
│ └── test_core.py
├── config.json # Example configuration
├── pyproject.toml # Dependencies and metadata
└── README.md # User documentation
- Project README: See
README.mdfor setup and usage - Singer SDK: https://sdk.meltano.com
- Meltano: https://meltano.com
- Singer Specification: https://hub.meltano.com/singer/spec
When implementing changes:
- Understand the existing code structure
- Follow Singer and SDK patterns
- Test thoroughly with real API credentials
- Update documentation and docstrings
- Ensure backward compatibility when possible
- Run linting and type checking
If you're uncertain about an implementation:
- Check SDK documentation for similar examples
- Review other Singer taps for patterns
- Test incrementally with small changes
- Validate against the Singer specification