Skip to content

Comments

Initial commit with meltano & v1-sync-helper#1

Merged
emsearcy merged 16 commits intomainfrom
ems/meltano
Nov 25, 2025
Merged

Initial commit with meltano & v1-sync-helper#1
emsearcy merged 16 commits intomainfrom
ems/meltano

Conversation

@emsearcy
Copy link
Contributor

No description provided.

Signed-off-by: Eric Searcy <eric@linuxfoundation.org>
Copilot AI review requested due to automatic review settings October 31, 2025 21:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR initializes an LFX v1<>v2 data sync project using Meltano for data integration. The PR adds a complete Meltano project structure with custom target plugin for NATS Key-Value store, along with extractors for PostgreSQL and DynamoDB.

Key changes:

  • Sets up Meltano project with extractors (tap-postgres, tap-dynamodb) and custom target (target-nats-kv)
  • Implements a Singer.io target plugin for streaming data to NATS JetStream key/value buckets
  • Configures project tooling (linting, licensing, Docker support)

Reviewed Changes

Copilot reviewed 21 out of 34 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pyproject.toml Defines root project dependencies (meltano 4.0.3) with Python 3.11 constraint
uv.lock Lock file for Python dependencies
meltano/meltano.yml Meltano configuration defining extractors and custom loader
meltano/Dockerfile Container image configuration using Meltano v4.0.2 with Python 3.12
meltano/load/target-nats-kv/ Custom Singer.io target plugin implementation for NATS KV store
.github/workflows/ CI/CD workflows for MegaLinter and license header checking
LICENSE, LICENSE-docs MIT and CC-BY-4.0 licenses for code and documentation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Eric Searcy <eric@linuxfoundation.org>
@emsearcy emsearcy changed the title Initial commit with meltano project Initial commit with meltano & v1-sync-helper Nov 21, 2025
Culmination of 2 days of AI assisted coding. Confirmed that project
creations and updates are getting triggered and flow through, with slugs
and parent_uid references working correctly.

*Some* (not exhaustive) of the prompts used to generate this code:

- work on the lfx-v1-sync-helper/v1-sync-helper go service. See the "data ingest sequence diagram" from [@README.md](arch/lfx-architecture-scratch/2025-10-LFX-One-Data-Sync/README.md). we are not doing wal-listener yet. we want to ensure we capture the projects, committees, and committee members coming into the v1 data KV Bucket from Meltano, and make authenticated calls into the Project Service. The v1-sync-helper previously assumed we'd be consuming NATS messages from Meltano itself, but this is not the case any more. The Meltano agent writes data into a NATS KV DB, instead of emitting CDC messages. Any "meltano" subjects in v1-sync-helper therefore should be removed, and replaced with NATS KV watchers. And wal-listener handlers can be kept, but they should be commented out because they won't work as previously stubbed out (they should write to the same NATS KV bucket as Meltano). Also, the v1-sync-helper previously assumed it would be sending v1 data directly to the indexer and fga-sync systems, which will not happen. Any egress code related to this can be removed. Create 2 kubernetes chart resources for the v1 objects (projects and more) KV bucket, and another for the mappings KV bucket we will use to store mappings. It should be 10Gi big. Use [@lfx-v2-meeting-service](lfx2/lfx-v2-meeting-service/charts/lfx-v2-meeting-service) for reference on the helm configurations. The lfx-v1-sync-helper needs to be configured with a Heimdall secret, and generate JWT tokens for the Authorization and X-On-Behalf-Of headers (see [@__init__.py](lfx2/lfx-v2-mockdata/src/lfx_v2_mockdata/__init__.py) 's !jwt macro and it's usage in [@1_tlf.yaml](lfx2/lfx-v2-mockdata/playbooks/projects/base_projects/1_tlf.yaml) and [@board_meeting.yaml](lfx2/lfx-v2-mockdata/playbooks/v1_meetings/umbrella_board_meeting/board_meeting.yaml)  for similar usage). (This writing to the projects/committee API services replaces the previous implementation where it was publishing to indexer& fga-sync NATS subjects)

- please compare [@mock-heimdall-jwt.sh](lfx2/lfx-v2-mockdata/scripts/mock-heimdall-jwt.sh) and class JWTGenerator to the logic in v1-sync-helper. like the former 2, we need to either pass in the key ID (kid), or extract it from the JWKS endpoint of Heimdall. The issuer (iss) of the JWT is just "heimdall". There is no need for v1-sync-helper to have a jwtClientID. (It *does* need an Auth0 Client ID for talking to LFX v1 APIs, but that is a different matter)

- if we use the KeyValue Watch(), then multiple instances of v1-sync-helper will all get the same updates. I suspect that in order to scale out v1-sync-helper, we'll need to create our own consumer over the v1KV bucket. This consumer should be defined using NACK (helm chart resources)

- change all references in v1-sync-helper of m2m_helper to v1_sync_helper. also, this should only be a fallback subject. if any given record being processed contains a v1 principal, we will impersonate that principal, including an email: {"principal":"$username","sub":"$username","email":"$email"}. If no principal is found, then it will be the fallback "v1_sync_helper" client ID (including the "clients@" prefix on `principal`).

- I need you to do 3 more things. FIRST, replace references to the "clients@" prefix and make it a "@Clients" suffix. SECOND, please understand that lfx_client.go is really lfx_v2_client.go, which handles downstream LFX v2 service calls with impersonation. we also will need a lfx_v1_client.go, which uses Auth0 private key auth (see [@README.md](arch/lfx-architecture-scratch/2024-08%20Auth0%20Private%20Key%20JWT%20examples/README.md) for Go example), and will need v1-sync-helper to have an AUTH0_TENANT, AUTH0_CLIENT_ID, AUTH0_PRIVATE_KEY, and LFX_API_GW (audience) env. LFX_API_GW audience can default to "https://api-gw.dev.platform.linuxfoundation.org/". THIRD, when extracting the principal from project or committee KV item updates, the field lastmodifiedbyid must be handled as follows: if it contains the suffix "@Clients", it is treated as a machine user and passed through (no `email`, and "@Clients" only is on the `principal`, not the `sub`). Otherwise, it is a v1 "platform ID", and we must attempt to convert it to a username and email via the v1 User Service, e.g.: GET https://api-gw.dev.platform.linuxfoundation.org/v1/users/{platformID}. On errors, log a warning and fall back to the v1_sync_helper@clients principal. If there is a 200 response, it will be JSON. If there is no "Username" attribute (missing or ""), log a warning and fall back to the v1_sync_helper@clients principal, otherwise use the "Username" verbatim. If there is a nonempty "Email" attribute, use it as well. Cache all User Service responses (including errors/empties) by storing the v1 User entry in the mappings KV bucket with a "_last_fetched" attribute. If the _last_fetched attribute is more than an hour old, use the stale copy and attempt to refresh the record in the background. Create a "lock" entry in the mapping KV with the current timestamp before refreshing any given platform ID; if the lock already exists and is not more than 10 seconds old, wait 1 second then check the cached item for updates before trying again.

- I'm working on lfx-v1-sync-helper/v1-sync-helper. v2 projects created via the project service API always must have a parent_uid. projects that have NO parent in v1, have a parent of slug=ROOT in v2. Therefore, if in the source data has *no* parent ID, we should call the slug_to_uid NATS request (see [@0_root.yaml](lfx2/lfx-v2-mockdata/playbooks/projects/base_projects/0_root.yaml) ) from the project handler, and set the parent_uid to the UUID returned from that request. no need to cache this.

- change the mapping from using the slug to using the sfid. this is the "primary key" for v1, not slug, so it's needed for foreign-key lookups ... we can implement the logic for parent_uid by checking our sfid mapping to find the UUID to populate parent_uid. we'll keep the slug mapping in case we need it for any future objects that use slug as foreign key. but only use sfid mapping, not slug, for determining if the incoming v1 data is a create or update.

- despite the extremely misleading name, project_name__c in the committee object (platform-collaboration__c) REALLY is the `sfid` ONLY. Don't try to make up new fields, there is never a "project_sfid__c" column in that data.

🤖 Generated with [GitHub Copilot](https://github.com/features/copilot) (via Zed)

Signed-off-by: Eric Searcy <eric@linuxfoundation.org>
Projects backfill is nearly complete now.

- Switch to Goa clients for committees and projects endpoints

- Comprehensive field mapping review of project objects, including data
  massaging to meet v2 tighter controls.

- Update Meltano NATS-loader to point to LFX cluster endpoint by default

- Add an HTTP debug environmental variable to assist with debugging HTTP
  requests (issue with Goa errors failing to parse response bodies and
  providing no relevant error messages, despite the error response
  _coming from Goa_ ... hoping upgrades of Goa in our services will
  resolve this in the future).

🤖 Generated with [GitHub Copilot](https://github.com/features/copilot) (via Zed)

Signed-off-by: Eric Searcy <eric@linuxfoundation.org>
🤖 Assisted with [GitHub Copilot](https://github.com/features/copilot) (via Zed)

Signed-off-by: Eric Searcy <eric@linuxfoundation.org>
- Add committee member sync handlers for platform-community__c records
- Implement committee member create/update logic in handlers_committees.go
- Add committee member client functions in client_committees.go
- Add KV handler routing for committee member records
- Add shared extractDateOnly helper function in handlers.go
- Adopt extractDateOnly for date-only fields:
  - Committee member role start/end dates
  - Committee member voting start/end dates
  - Project formation, dissolution, and announcement dates

Adds complete committee member syncing from V1 to V2 with proper date handling.

🤖 Generated with [GitHub Copilot](https://github.com/features/copilot) (via Zed)

Signed-off-by: Eric Searcy <eric@linuxfoundation.org>
The Committee Service API requires version='1' query parameter for
committee member operations. This resolves 400 Bad Request errors
with missing_field errors.

🤖 Generated with [GitHub Copilot](https://github.com/features/copilot) (via Zed)

Signed-off-by: Eric Searcy <eric@linuxfoundation.org>
- Only create Role struct when role__c field exists in V1 data
- Only create Voting struct when voting_status__c field exists in V1 data
- Within those conditionals, only add date fields if they have valid values
- This prevents API validation errors for empty date strings and URIs

Resolves parsing errors:
- body.role.start_date must be formatted as a date
- body.role.end_date must be formatted as a date
- body.voting.start_date must be formatted as a date
- body.voting.end_date must be formatted as a date
- body.organization.website must be formatted as a uri

🤖 Generated with [GitHub Copilot](https://github.com/features/copilot) (via Zed)

Signed-off-by: Eric Searcy <eric@linuxfoundation.org>
- Add organization lookup from organization-service/v1/orgs/{sfid}
- Implement caching/locking pattern similar to user lookups with
  stale-while-revalidate semantics (30min fresh, 6hr stale window)
- Parse organization website URL from Link attribute first, then
  fall back to Domain attribute with proper URL scheme handling
- Update committee member handlers to populate Organization name
  and website fields from fetched organization data
- Add comprehensive error handling and invalid state caching

🤖 Generated with GitHub Copilot (via Zed)

Signed-off-by: Eric Searcy <eric@linuxfoundation.org>
- Remove Link field from V1Organization and V1OrganizationResponse structs
- Update parseWebsiteURL function signature to take single 'website' parameter
- Rename variables to be consistent with new parameter naming
- Simplify website URL parsing to only use Domain attribute from org data

🤖 Generated with GitHub Copilot (via Zed)

Signed-off-by: Eric Searcy <eric@linuxfoundation.org>
Not exhaustively passing, but in the right direction.

Signed-off-by: Eric Searcy <eric@linuxfoundation.org>
This commit adds a "refresh_mode" configuration option to the NATS KV
Meltano loader. This option allows users to specify how the loader should
handle existing keys in the NATS KV store when loading data based on the
timestamp of the incoming records. The default (and previous) behavior
was to only write new keys or update existing keys if the incoming record
had a more recent timestamp than the existing key. With the addition of the
"refresh_mode" option, users can now choose between 3 modes:

- "newer": only write incoming records to the NATS KV store if their
  timestamps are more recent than existing keys (this is the default behavior).
- "same": in addition to the "newer" behavior, also update existing keys if the
  incoming record has the same timestamp as the existing key.
- "all": ignore timestamps and write all incoming records to the NATS KV store,
  overwriting existing keys regardless of their timestamps.

*The reason for this* is that during initial data loads or certain data
refresh scenarios, we may wish to re-trigger v1-sync-helper to
re-attempt to write the data into V2. This is especially true on the
initial load, where project relations (parent, legal parent) *may not
exist* when the record is attempted to be written, and will result in a
failure since we cannot write the record without translating those IDs
to their V2 equivalents. Combining a Meltano `--full-refresh` with
`TARGET_NATS_KV_REFRESH_MODE=same` allows us to keep re-running the sync
until all records are successfully resolved and written.

Signed-off-by: Eric Searcy <eric@linuxfoundation.org>
…fields

- Add allowedAppointedByValues map with all valid enum values from API spec
- Add allowedRoleNames map with all valid role name values from API spec
- Implement mapAppointedByToValidValue() function to validate appointed_by field
- Implement mapRoleNameToValidValue() function to validate role name field
- Update both create and update payload functions to use validation
- Change default appointed_by from 'Unknown' to 'None' (valid enum value)
- Invalid values now fall back to 'None' with warning logs
- Add early email validation to skip committee members with blank emails
- Log warning with sfid when skipping members with blank emails
- Remove redundant email validation from mapping functions

Fixes issues where 'Unknown' and 'Technical Lead' were rejected by API
as invalid enum values for appointed_by and role.name fields respectively.

🤖 Generated with [GitHub Copilot](https://github.com/features/copilot) (via Zed)

Signed-off-by: Eric Searcy <eric@linuxfoundation.org>
- Update root README with complete LFX One Data Sync architecture, links
  to subcomponents
- Add high-level flowchart diagram illustrating data sync process
- Add all sequence diagrams from the private technical architecture write-ups
- Update Meltano README with usage instructions and command modifiers
- Add new Helm chart README with configuration details
- Add license section and proper linking between component READMEs
- Standardize on Python 3.12 for Meltano and target-nats-kv

🤖 Generated with [GitHub Copilot](https://github.com/features/copilot) (via Zed)

Signed-off-by: Eric Searcy <eric@linuxfoundation.org>
Signed-off-by: Eric Searcy <eric@linuxfoundation.org>
Signed-off-by: Eric Searcy <eric@linuxfoundation.org>
@emsearcy emsearcy merged commit bb648e4 into main Nov 25, 2025
3 checks passed
@emsearcy emsearcy deleted the ems/meltano branch November 25, 2025 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants