refactor(source-shiftbase): Migrate to manifest-only YAML connector#72899
refactor(source-shiftbase): Migrate to manifest-only YAML connector#72899Aaron ("AJ") Steers (aaronsteers) wants to merge 42 commits intomasterfrom
Conversation
Co-Authored-By: AJ Steers <aj@airbyte.io>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksPR Slash CommandsAirbyte Maintainers (that's you!) can execute the following slash commands on your PR:
📚 Show Repo GuidanceHelpful Resources
|
|
|
||
| ### Prerequisites | ||
|
|
||
| * Python (`^3.9`) |
There was a problem hiding this comment.
[markdownlint] reported by reviewdog 🐶
MD004/ul-style Unordered list style [Expected: dash; Actual: asterisk]
| ### Prerequisites | ||
|
|
||
| * Python (`^3.9`) | ||
| * Poetry (`^1.7`) - installation instructions [here](https://python-poetry.org/docs/#installation) |
There was a problem hiding this comment.
[markdownlint] reported by reviewdog 🐶
MD004/ul-style Unordered list style [Expected: dash; Actual: asterisk]
| * Python (`^3.9`) | ||
| * Poetry (`^1.7`) - installation instructions [here](https://python-poetry.org/docs/#installation) | ||
|
|
||
|
|
There was a problem hiding this comment.
[markdownlint] reported by reviewdog 🐶
MD012/no-multiple-blanks Multiple consecutive blank lines [Expected: 1; Actual: 2]
| * Poetry (`^1.7`) - installation instructions [here](https://python-poetry.org/docs/#installation) | ||
|
|
||
|
|
||
|
|
There was a problem hiding this comment.
[markdownlint] reported by reviewdog 🐶
MD012/no-multiple-blanks Multiple consecutive blank lines [Expected: 1; Actual: 3]
| ### Installing the connector | ||
|
|
||
| From this connector directory, run: | ||
| ```bash |
There was a problem hiding this comment.
[markdownlint] reported by reviewdog 🐶
MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines [Context: "```bash"]
| ### Running as a docker container | ||
|
|
||
| Then run any of the connector commands as follows: | ||
| ``` |
There was a problem hiding this comment.
[markdownlint] reported by reviewdog 🐶
MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines [Context: "```"]
| ### Running as a docker container | ||
|
|
||
| Then run any of the connector commands as follows: | ||
| ``` |
There was a problem hiding this comment.
[markdownlint] reported by reviewdog 🐶
MD040/fenced-code-language Fenced code blocks should have a language specified [Context: "```"]
|
|
||
| ### Dependency Management | ||
|
|
||
| All of your dependencies should be managed via Poetry. |
There was a problem hiding this comment.
[markdownlint] reported by reviewdog 🐶
MD009/no-trailing-spaces Trailing spaces [Expected: 0 or 2; Actual: 1]
| ## Publishing a new version of the connector | ||
|
|
||
| You've checked out the repo, implemented a million dollar feature, and you're ready to share your changes with the world. Now what? | ||
| 1. Make sure your changes are passing our test suite: `airbyte-ci connectors --name=source-shiftbase test` |
There was a problem hiding this comment.
[markdownlint] reported by reviewdog 🐶
MD032/blanks-around-lists Lists should be surrounded by blank lines [Context: "1. Make sure your changes are ..."]
|
|
||
| You've checked out the repo, implemented a million dollar feature, and you're ready to share your changes with the world. Now what? | ||
| 1. Make sure your changes are passing our test suite: `airbyte-ci connectors --name=source-shiftbase test` | ||
| 2. Bump the connector version (please follow [semantic versioning for connectors](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook/#semantic-versioning-for-connectors)): |
There was a problem hiding this comment.
[markdownlint] reported by reviewdog 🐶
MD009/no-trailing-spaces Trailing spaces [Expected: 0 or 2; Actual: 1]
| * Python (`^3.9`) | ||
| * Poetry (`^1.7`) - installation instructions [here](https://python-poetry.org/docs/#installation) | ||
|
|
||
|
|
There was a problem hiding this comment.
[markdownlint-fix] reported by reviewdog 🐶
| * Python (`^3.9`) | |
| * Poetry (`^1.7`) - installation instructions [here](https://python-poetry.org/docs/#installation) | |
| - Python (`^3.9`) | |
| - Poetry (`^1.7`) - installation instructions [here](https://python-poetry.org/docs/#installation) |
| ### Installing the connector | ||
|
|
||
| From this connector directory, run: | ||
| ```bash |
There was a problem hiding this comment.
[markdownlint-fix] reported by reviewdog 🐶
| ```bash | |
| ```bash |
| poetry install --with dev | ||
| ``` | ||
|
|
||
|
|
There was a problem hiding this comment.
[markdownlint-fix] reported by reviewdog 🐶
| Note that any directory named `secrets` is gitignored across the entire Airbyte repo, so there is no danger of accidentally checking in sensitive information. | ||
| See `sample_files/sample_config.json` for a sample config file. | ||
|
|
||
|
|
There was a problem hiding this comment.
[markdownlint-fix] reported by reviewdog 🐶
|
|
||
| 1. Install [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md) | ||
| 2. Run the following command to build the docker image: | ||
| ```bash |
There was a problem hiding this comment.
[markdownlint-fix] reported by reviewdog 🐶
| ```bash | |
| ```bash |
|
|
||
| An image will be available on your host with the tag `airbyte/source-shiftbase:dev`. | ||
|
|
||
|
|
There was a problem hiding this comment.
[markdownlint-fix] reported by reviewdog 🐶
| ### Running as a docker container | ||
|
|
||
| Then run any of the connector commands as follows: | ||
| ``` |
There was a problem hiding this comment.
[markdownlint-fix] reported by reviewdog 🐶
| ``` | |
| ``` |
|
|
||
| ### Dependency Management | ||
|
|
||
| All of your dependencies should be managed via Poetry. |
There was a problem hiding this comment.
[markdownlint-fix] reported by reviewdog 🐶
| All of your dependencies should be managed via Poetry. | |
| All of your dependencies should be managed via Poetry. |
| ## Publishing a new version of the connector | ||
|
|
||
| You've checked out the repo, implemented a million dollar feature, and you're ready to share your changes with the world. Now what? | ||
| 1. Make sure your changes are passing our test suite: `airbyte-ci connectors --name=source-shiftbase test` |
There was a problem hiding this comment.
[markdownlint-fix] reported by reviewdog 🐶
| 1. Make sure your changes are passing our test suite: `airbyte-ci connectors --name=source-shiftbase test` | |
| 1. Make sure your changes are passing our test suite: `airbyte-ci connectors --name=source-shiftbase test` |
|
|
||
| You've checked out the repo, implemented a million dollar feature, and you're ready to share your changes with the world. Now what? | ||
| 1. Make sure your changes are passing our test suite: `airbyte-ci connectors --name=source-shiftbase test` | ||
| 2. Bump the connector version (please follow [semantic versioning for connectors](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook/#semantic-versioning-for-connectors)): |
There was a problem hiding this comment.
[markdownlint-fix] reported by reviewdog 🐶
| 2. Bump the connector version (please follow [semantic versioning for connectors](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook/#semantic-versioning-for-connectors)): | |
| 2. Bump the connector version (please follow [semantic versioning for connectors](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook/#semantic-versioning-for-connectors)): |
…config Co-Authored-By: AJ Steers <aj@airbyte.io>
|
|
Deploy preview for airbyte-docs ready! ✅ Preview Built with commit 2048581. |
|
for streams that have account_id in their records schema use account_id in pk list instead of account_name |
…ccount_id in schema Co-Authored-By: AJ Steers <aj@airbyte.io>
|
Daryna Ishchenko (@darynaishchenko) Done — updated PKs to use
The remaining streams ( |
…_id to employees PK, add _year_partition doc comment Co-Authored-By: AJ Steers <aj@airbyte.io>
|
Re: Comment 65 (employees_report schema vs API docs) The API docs at https://developer.shiftbase.com/docs/core/4d05f64e94419-employees-report show two response formats:
Our connector sends The JSON example in the docs only shows 7 fields (userId, userName, userEmployeeNr, year, weeknumber, start, end), but the actual API likely returns more fields when no Without test credentials to verify, I've kept the schema as-is from the original connector. If you'd like, I can trim it to only the 7 fields shown in the docs example, but that might lose valid data fields. |
|
Re: Comment 67 (timesheet_detail_report schema vs API docs) Same situation as the employees_report — the API docs at https://developer.shiftbase.com/docs/core/5612d41bb72b1-timesheet-detail-report show the "Raw format" response by default (columns/rows tabular structure), but our connector sends The schema in our manifest matches the JSON response format, not the raw format shown in the docs. The fields were transcribed from the original Python connector (PR #72829) which was presumably tested against the actual API. The API docs have a "JSON format" response example toggle (next to "Raw format") that shows the flat record format matching our schema. |
|
Re: Comment 69 (accounts field not highlighted as required in Cloud UI) I investigated the spec and the schema declaration is correct: spec:
type: Spec
connection_specification:
$schema: http://json-schema.org/draft-07/schema#
title: Shiftbase Spec
type: object
required:
- accounts
- start_date
properties:
accounts:
type: array
...The This is likely a UI rendering behavior specific to The schema itself is correct — the field IS required and validation should still enforce it even if the visual indicator is missing. If this is a known issue, it might be worth filing a bug against the platform UI. |
| - - id | ||
| - - department_id | ||
| - - account_id | ||
| retriever: |
There was a problem hiding this comment.
can we use /definitions/base_requester here and overwrite only different parts? also apply this refactoring for other streams that doesn't use /definitions/base_requester
…ribution streams Co-Authored-By: AJ Steers <aj@airbyte.io>
|
Daryna Ishchenko (@darynaishchenko) Re: using Refactored All other streams already used |
…enforcement Co-Authored-By: AJ Steers <aj@airbyte.io>
…' of https://git-manager.devin.ai/proxy/github.com/airbytehq/airbyte into devin/1770321883-migrate-source-shiftbase-manifest-only
Co-Authored-By: AJ Steers <aj@airbyte.io>
| primary_key: | ||
| - - id | ||
| - - account_name |
There was a problem hiding this comment.
🔴 availabilities_stream uses mutable user-provided account_name in primary key instead of stable account_id
The availabilities_stream primary key is [id, account_name] (line 263-264), but account_name is a user-provided configuration string (manifest.yaml:1604-1607), not a stable API-returned identifier. Every other standard GET-based stream in this connector uses account_id (an API-returned field) in its composite primary key: departments_stream (manifest.yaml:57-58), absentees_stream (manifest.yaml:218-219), shifts_stream (manifest.yaml:307-308), users_stream (manifest.yaml:338-339). If the user changes the account_name value in their config between syncs, records from the same availability will produce different primary keys, breaking deduplication and causing duplicates in the destination. The Shiftbase API consistently returns account_id in its standard entity endpoints (as shown by all other streams' schemas), so availabilities almost certainly does too — but the schema (manifest.yaml:949-992) omits account_id entirely.
Prompt for agents
In airbyte-integrations/connectors/source-shiftbase/manifest.yaml, fix the availabilities_stream to use account_id instead of account_name in its primary key, consistent with all other standard GET streams:
1. At line 262-264, change the primary_key from [id, account_name] to [id, account_id]
2. In the availabilities_stream schema (lines 949-992), add an account_id property of type [null, string]
3. Verify that the Shiftbase availabilities API endpoint returns account_id in the Availability object (consistent with departments, absentees, shifts, users endpoints)
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
The observation is valid in principle — account_name is user-provided and mutable, so account_id would be more stable for deduplication.
However, the availabilities schema currently doesn't include account_id because we haven't confirmed the Shiftbase /availabilities endpoint returns it. During the earlier review cycle, Daryna Ishchenko (@darynaishchenko) instructed us to use account_id in PKs for streams where it exists in the schema — and we did that for departments, absentees, shifts, and users. Availabilities was left with account_name because its schema doesn't have account_id.
Adding account_id to the schema without API verification risks referencing a field that doesn't exist in the response, which would produce null PK values. Deferring to the human reviewer on whether to add it based on API knowledge.
|
…s in spec Co-Authored-By: AJ Steers <aj@airbyte.io>
…' of https://git-manager.devin.ai/proxy/github.com/airbytehq/airbyte into devin/1770321883-migrate-source-shiftbase-manifest-only
|
|
This PR looks good but still account details are not required during source setup on cloud. Trying to resolve it. Devin created new PR to fix it here. |
|
❌ Cannot revive Devin session - the session is too old. Please start a new session instead. |
| employees_report_stream: | ||
| type: DeclarativeStream | ||
| name: employees_report |
There was a problem hiding this comment.
🟡 employees_report_stream is missing primary_key, preventing dedup sync modes
The employees_report_stream (line 363) does not define a primary_key, unlike all other 9 streams in the manifest. The schema at manifest.yaml:1168-1239 includes userId which is a natural unique identifier for employee reports. Without a primary_key, this stream cannot support Incremental Append+Deduped or Full Refresh Overwrite with deduplication. Given that every other stream in the connector defines a primary key, this appears to be an oversight.
| employees_report_stream: | |
| type: DeclarativeStream | |
| name: employees_report | |
| employees_report_stream: | |
| type: DeclarativeStream | |
| name: employees_report | |
| primary_key: | |
| - - userId | |
| - - account_name |
Was this helpful? React with 👍 or 👎 to provide feedback.
|
/publish-connectors-prerelease
|
…eport stream Add optional end_date config field used by the schedule_detail_report stream to fetch future scheduled records. Defaults to 1 month (30 days) into the future via day_delta(30) when not provided by the user. Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
Rename the spec field to schedule_report_end_date to make it explicit that this parameter is only used by the schedule_detail_report stream. Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
This PR targets the following PR:
What
Migrates the Shiftbase source connector from a Python-based implementation to a manifest-only declarative YAML connector, removing all custom Python code in favor of a single
manifest.yaml.Additionally adds an optional
schedule_report_end_datespec field for the Schedule Detail Report stream to support fetching future scheduled records (defaults to 1 month into the future).Requested by Aaron ("AJ") Steers (@aaronsteers). Link to Devin run.
How
source_shiftbase/source.py,streams.py, and all Python infrastructure with a declarativemanifest.yamlListPartitionRouteriterating overconfig.accountsarraySubstreamPartitionRouter(departments→employees, employees→employee_time_distribution)access_tokenvia theparent_slicechain (e.g.{{ stream_partition.parent_slice.account.access_token }}for employees,{{ stream_partition.parent_slice.parent_slice.account.access_token }}for employee_time_distribution)account_idpropagated to child streams viaextra_fieldsinParentStreamConfig(departments→employees→employee_time_distribution)DatetimeBasedCursorwithstart_time_option/end_time_optioninjectingfrom/tointo POST bodyemployee_time_distributionusesDatetimeBasedCursorwithdatetime_format: "%Y",step: P1Y, andstart_time_optionto inject theyearrequest parameter directlydefinitions.schemassection with$refpointers for easier reviewaccount_id(for streams that have it in their schema) oraccount_name(for streams usingListPartitionRouterwithoutaccount_id) to ensure uniqueness across multi-account syncsmetadata.yamlto usesource-declarative-manifest:7.6.5base image withlanguage:manifest-onlyandcdk:low-codetagssource_shiftbase/,main.py,pyproject.toml,poetry.lock,requirements.txt,unit_tests/,integration_tests/acceptance-test-config.ymlwith bypass reasons for all test types (no test credentials available)docs/integrations/sources/shiftbase.mdschedule_report_end_datespec field (format:YYYY-MM-DD) used byschedule_detail_reportstream asend_datetime; defaults today_delta(30, '%Y-%m-%d')(approximately 1 month into the future) when not providedReview guide
Items resolved during review:
account_idoraccount_namefor multi-account uniqueness.employeesPK includesdepartment_id.employees_reporthas no PK (userId not unique across daily syncs).end_datetimeremoved from all incremental syncs (CDK defaults to now).start_time_option/end_time_optionused for request param injection instead of manual interpolation.account_idpropagation: Child streams (employees, employee_time_distribution) receiveaccount_idfrom parent viaextra_fields.definitions.schemassection with$refreferences.employee_time_distribution:datetime_formatchanged to"%Y"withstart_time_optioninjecting theyearparam directly instead of string slicing.format: dateadded tostart_datespec field for date picker UI.Updates since last revision:
schedule_report_end_datespec field added: Optional config field explicitly scoped to the Schedule Detail Report stream. When provided, it sets theend_datetimefor theschedule_detail_reportcursor. When omitted, defaults to 30 days from the current date viaday_delta(30, '%Y-%m-%d'). This enables fetching future scheduled records. Field was namedschedule_report_end_date(rather than genericend_date) to make its stream-specific scope clear.Human review checklist:
manifest.yaml—parent_slicechain for grandchild streamsemployee_time_distributionuses{{ stream_partition.parent_slice.parent_slice.account.access_token }}(two levels deep). This pattern relies on the CDK propagating the full partition context through nestedSubstreamPartitionRouterlayers. Untested live.manifest.yaml— Report stream schemasexport: jsonin POST body, which returns flat JSON records (not the raw columns/rows format shown by default in API docs). Schemas were transcribed from the original Python connector. Verify field accuracy if possible.manifest.yaml—_year_partitionsynthetic cursorAddFieldstransformation. Verify this pattern works for year-by-year iteration without breaking state management.manifest.yaml—employees_reporthas no PKuserIdis not unique across daily syncs, so no primary key is defined. This means no deduplication in append+dedup mode.README.mdCloud UI —
accountsfield not highlighted as requiredrequired: [accounts, start_date]), but the Cloud UI may not display the required indicator for array-type fields with nested objects. This appears to be a UI rendering behavior, not a schema issue.manifest.yaml—schedule_report_end_date/day_deltaJinja expressionschedule_detail_reportuses{{ config.get('schedule_report_end_date', day_delta(30, '%Y-%m-%d')) }}forend_datetime. Verify thatconfig.get()andday_deltawith a format argument resolve correctly in the CDK's Jinja interpolation context. Note that 30 days is an approximation of "1 month". Untested live.User Impact
schedule_report_end_dateconfig field allows the Schedule Detail Report stream to fetch future scheduled records (defaults to 1 month ahead)Can this PR be safely reverted and rolled back?
Link to Devin session: https://app.devin.ai/sessions/f6d1ddc1eb074fddbbb41a42271999be