Skip to content

Conversation

@lukaszgryglicki
Copy link
Member

cc @mlehotskylf @ahmedomosanya - fix the issue with actual AWS update that wasn't detected in dry-run mode. No logic updates.

Signed-off-by: Lukasz Gryglicki [email protected]

Assisted by OpenAI

Assisted by GitHub Copilot

Signed-off-by: Lukasz Gryglicki <[email protected]>

Assisted by [OpenAI](https://platform.openai.com/)

Assisted by [GitHub Copilot](https://github.com/features/copilot)
Signed-off-by: Lukasz Gryglicki <[email protected]>

Assisted by [OpenAI](https://platform.openai.com/)

Assisted by [GitHub Copilot](https://github.com/features/copilot)
@coderabbitai
Copy link

coderabbitai bot commented Nov 25, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Refactors the signatures timestamp backfill tool to build DynamoDB condition expressions from only defined attribute names, pass the expression into CLI and UpdateItem calls, and conditionally handle approx_date_created/approx_date_modified; adds a signature date stats shell script.

Changes

Cohort / File(s) Summary
Backfill tool (Go)
cla-backend-go/cmd/signatures_timestamp_backfill/main.go
Added buildConditionExpression(names) to create a condition expression from present attribute names; changed buildAwsCliUpdate(..., condExpr string) signature; replaced static ExpressionAttributeNames with conditional population; compute and pass condExpr to CLI builder and DynamoDB UpdateItem in firstPassScanAndUpdate and snowflakeFix; handle approx_date_created/approx_date_modified conditionally for Fivetran-synced data.
Utilities (shell)
utils/signature_dates_stats.sh
New shell script performing four AWS DynamoDB scans on cla-prod-signatures to count missing/null date_created/date_modified and existence of approx_date_created/approx_date_modified.

Sequence Diagram(s)

sequenceDiagram
    actor Scanner as Scan Operation
    participant Main as Backfill Main
    participant CondBuilder as buildConditionExpression()
    participant CLIBuilder as buildAwsCliUpdate()
    participant DynamoDB as DynamoDB UpdateItem

    Scanner->>Main: identify attributes to update
    Main->>CondBuilder: pass `names` map (only defined attrs)
    CondBuilder-->>Main: return `condExpr` (OR of attribute_not_exists/empty or default)
    Main->>CLIBuilder: pass region, table, sigID, updateExpr, names, values, condExpr
    CLIBuilder-->>Main: return AWS CLI update command
    Main->>DynamoDB: call UpdateItem with UpdateExpression, Names, Values, ConditionExpression=condExpr
    alt Condition satisfied
        DynamoDB-->>Main: update success
    else Condition not met
        DynamoDB-->>Main: ConditionalCheckFailedException
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25–30 minutes

  • Verify correctness of OR composition and default fallback in buildConditionExpression.
  • Confirm all UpdateItem and CLI call sites compute and pass condExpr.
  • Check conditional population of ExpressionAttributeNames for approx_date_* branches (Fivetran vs regular).
  • Review formatting and correctness of the new utils/signature_dates_stats.sh scans (attribute name placeholders and filters).

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Unicron fix backfill script' is vague and generic, using non-descriptive terms without conveying what the actual problem or fix is. Use a more specific title that describes the actual issue being fixed, such as 'Fix condition expressions in signature timestamp backfill to detect AWS updates in dry-run mode'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The description mentions fixing an issue where AWS updates were not detected in dry-run mode, which relates to the changeset's modifications to condition expressions.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch unicron-fix-backfill-script

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug in the DynamoDB backfill script where the condition expression referenced attribute names that weren't always defined in the ExpressionAttributeNames map. The issue only manifested during actual AWS updates (not in dry-run mode), causing validation errors when only one of the date fields needed updating.

Key changes:

  • Introduced dynamic condition expression building based on actually defined attribute names
  • Changed from static to dynamic population of the names map to only include attributes being updated
  • Applied the fix consistently across both firstPassScanAndUpdate and snowflakeFix functions

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
cla-backend-go/cmd/signatures_timestamp_backfill/main.go (1)

605-639: Snowflake path condition building is sound; consider avoiding duplicate recomputation

Wiring names for #date_* vs #approx_date_* based on Fivetran vs non-Fivetran sources and then using condExpr := buildConditionExpression(names) for both the CLI and SDK update keeps conditions consistent and prevents mismatches when only approx fields are touched. The second call condExpr = buildConditionExpression(names) just before UpdateItem is redundant because names is unchanged, so you could reuse the earlier condExpr to simplify the flow.

Also applies to: 661-663, 676-678, 685-685

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2ef46a5 and f2bdbf0.

📒 Files selected for processing (1)
  • cla-backend-go/cmd/signatures_timestamp_backfill/main.go (10 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: CodeQL analysis (go)
  • GitHub Check: Agent
  • GitHub Check: cypress-functional
  • GitHub Check: build-test-lint
🔇 Additional comments (3)
cla-backend-go/cmd/signatures_timestamp_backfill/main.go (3)

409-472: Dynamic per-field condition expression in first pass looks correct

Building names from the actually-updated attributes and deriving condExpr := buildConditionExpression(names) before both the CLI and UpdateItemWithContext calls keeps the condition aligned with the update expression and avoids DynamoDB validation errors from referencing unmapped #-placeholders. The semantics (“only update when the specific target date fields are missing/empty”) remain consistent with the previous condAnyMissing intent, just restricted to the fields you’re writing.

Please run a small integration check against a test table to confirm that:

  • Updates succeed when exactly one of date_created / date_modified is missing or empty.
  • Updates are rejected (ConditionalCheckFailed) if another process fills the same field between scan and update.

1065-1096: buildConditionExpression matches prior semantics and safely extends them to approx fields

The helper correctly:

  • Adds per-field clauses only for keys present in names, so DDB never sees undefined #... placeholders.
  • Keeps the “any of the relevant date fields is missing/empty” behavior via OR-ing the per-field conditions.
  • Falls back to attribute_exists(signature_id) when no date fields are referenced, which is a safe always-on guard that doesn’t require extra ExpressionAttributeNames or Values.

It would be good to verify against DynamoDB docs or a quick live test that:

  • Extra entries in ExpressionAttributeNames not referenced by condExpr are ignored.
  • The fallback attribute_exists(signature_id) behaves as expected when the item is missing (i.e., fails the conditional update).

1102-1140: Plumbing condExpr through buildAwsCliUpdate aligns dry-run CLI with live UpdateItem behavior

Extending buildAwsCliUpdate to accept condExpr and injecting it as --condition-expression '%s' ensures the generated AWS CLI commands now use the same condition as the SDK path, fixing the dry-run/real-run mismatch. The JSON flattening of names/values still only includes attributes actually in play, and the new condition string doesn’t introduce shell-quoting issues since it contains no single quotes.

After deploying, please spot-check a few generated CLI lines (for both regular and Fivetran/approx cases) by running them manually to confirm the condition expressions and attribute maps match what DynamoDB accepts.

Signed-off-by: Lukasz Gryglicki <[email protected]>

Assisted by [OpenAI](https://platform.openai.com/)

Assisted by [GitHub Copilot](https://github.com/features/copilot)
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
utils/signature_dates_stats.sh (1)

1-5: Add error handling and inline comments for clarity.

This utility script executes against production credentials and table without any error handling or documentation. Consider the following improvements:

  1. Add set -e to exit on errors
  2. Add inline comments above each scan explaining what's being counted
  3. Validate the AWS profile is available before execution

Here's a refactored version with error handling and documentation:

#!/bin/bash
set -e

PROFILE="lfproduct-prod"
TABLE="cla-prod-signatures"

# Validate AWS profile is configured
if ! aws configure list --profile "$PROFILE" >/dev/null 2>&1; then
  echo "Error: AWS profile '$PROFILE' not found" >&2
  exit 1
fi

# Count signatures with missing or NULL date_created
echo "Counting signatures with missing/NULL date_created..."
aws --profile "$PROFILE" dynamodb scan --table-name "$TABLE" \
  --filter-expression "attribute_not_exists(#a) OR #a = :nullval" \
  --expression-attribute-names '{"#a":"date_created"}' \
  --expression-attribute-values '{":nullval":{"NULL":true}}' \
  --select "COUNT"

# Count signatures with existing approx_date_created
echo "Counting signatures with existing approx_date_created..."
aws --profile "$PROFILE" dynamodb scan --table-name "$TABLE" \
  --filter-expression "attribute_exists(#a)" \
  --expression-attribute-names '{"#a":"approx_date_created"}' \
  --select "COUNT"

# Count signatures with missing or NULL date_modified
echo "Counting signatures with missing/NULL date_modified..."
aws --profile "$PROFILE" dynamodb scan --table-name "$TABLE" \
  --filter-expression "attribute_not_exists(#a) OR #a = :nullval" \
  --expression-attribute-names '{"#a":"date_modified"}' \
  --expression-attribute-values '{":nullval":{"NULL":true}}' \
  --select "COUNT"

# Count signatures with existing approx_date_modified
echo "Counting signatures with existing approx_date_modified..."
aws --profile "$PROFILE" dynamodb scan --table-name "$TABLE" \
  --filter-expression "attribute_exists(#a)" \
  --expression-attribute-names '{"#a":"approx_date_modified"}' \
  --select "COUNT"

This improves reliability by catching errors early, validates credentials before execution, extracts variables for maintainability, breaks long lines for readability, and documents each scan's purpose.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f2bdbf0 and 0638a36.

📒 Files selected for processing (1)
  • utils/signature_dates_stats.sh (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-test-lint
  • GitHub Check: cypress-functional

Signed-off-by: Lukasz Gryglicki <[email protected]>

Assisted by [OpenAI](https://platform.openai.com/)

Assisted by [GitHub Copilot](https://github.com/features/copilot)
@lukaszgryglicki lukaszgryglicki merged commit a418f4e into dev Nov 25, 2025
3 of 6 checks passed
@lukaszgryglicki lukaszgryglicki deleted the unicron-fix-backfill-script branch November 25, 2025 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants