Skip to content

Conversation

@digitalghost-dev
Copy link
Owner

@digitalghost-dev digitalghost-dev commented Oct 12, 2025

Summary by CodeRabbit

  • New Features

    • Pokémon CLI now displays Egg Group(s).
    • Added pricing data pipeline with a new pricing table for downstream use.
  • Improvements

    • Data loads are more resilient with retries and stricter quality checks.
    • Database relationship setup is safer with automated constraint drop-and-recreate flow.
    • Transformation mapping improved for more predictable asset resolution.
  • Documentation

    • Expanded guides for AWS RDS, Supabase, and Terraformer.
  • Tests

    • Updated snapshots to include Egg Group(s) and latest version text.
  • Chores

    • Removed legacy standalone pricing script.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 12, 2025

Walkthrough

Adds pricing extraction and loading assets, explicit Dagster asset names and retry policies, a custom dbt translator, new/updated dbt models and macro, removes an old pricing script, extends Go CLI and structs to show egg groups, updates golden tests, and expands infrastructure docs.

Changes

Cohort / File(s) Summary of Changes
Extract pipeline asset edits
card_data/pipelines/defs/extract/extract_data.py
Include series id "me" in filters/URL lists; add explicit public asset names for several extract functions; minor internal comment/log change.
New pricing extraction
card_data/pipelines/defs/extract/extract_pricing_data.py
New module: SET_PRODUCT_MATCHING, Pydantic CardPricing model, helpers (is_card, get_card_number, extract_card_name), pull_product_information, and Dagster asset build_dataframe that fetches, validates, and returns a Polars DataFrame.
Load pipeline edits
card_data/pipelines/defs/load/load_data.py
Added explicit asset names and RetryPolicy (max_retries=3, delay=2, exponential backoff) to load assets; tightened OperationalError logging/re-raise; renamed quality check asset and enforce Soda check return code.
New pricing load
card_data/pipelines/defs/load/load_pricing_data.py
New Dagster asset load_pricing_data (depends on build_dataframe) writes staging.pricing_data to DB (Postgres/Supabase) with retry policy and error handling.
Custom dbt translator & assets
card_data/pipelines/defs/transformation/transform_data.py
Added CustomDbtTranslator (subclasses DagsterDbtTranslator) to remap certain dbt sources to specific AssetKeys; @dbt_assets updated to use the custom translator.
DBT macro change
card_data/pipelines/poke_cli_dbt/macros/create_relationships.sql
Reworked to first drop existing constraints (IF EXISTS) in reverse dependency order, then add PKs and FKs; diagnostic prints added; explicit return added.
DBT models & sources
card_data/pipelines/poke_cli_dbt/models/cards.sql, .../models/pricing_data.sql, .../models/sources.yml
cards.sql adds set_id, set_cardCount_official, and set_name; new pricing_data model (table materialized, RLS post-hook) selects from staging.pricing_data; sources.yml adds pricing_data source and columns.
Soda checks
card_data/pipelines/soda/checks.yml
Increased expected row_count for series from 2 to 3.
Removed script
card_data/tcg_pricing.py
Deleted legacy pricing script that fetched/printer product pricing and executed on import.
Go CLI & structs update (egg groups)
cmd/pokemon/pokemon.go, structs/structs.go
Added EggGroups field to species struct; implemented egg group collection/sorting/display in CLI; imported sort; adjusted output buffers.
Golden test updates
testdata/*.golden
Updated multiple golden files to include Egg Group(s) lines and bumped latest version string to v1.7.0.
Docs additions/edits
docs/Infrastructure_Guide/aws.md, .../supabase.md, .../terraform.md
Expanded RDS setup guidance, added Supabase guide, and extended Terraformer docs with profile/installation/run instructions.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Scheduler as Dagster
  participant Extract as extract_pricing_data.build_dataframe
  participant API as External API (tcg)
  participant Load as load_pricing_data
  participant DB as Postgres/Supabase

  Scheduler->>Extract: materialize pricing asset
  loop per set in SET_PRODUCT_MATCHING
    Extract->>API: GET products/prices
    API-->>Extract: JSON payload
    Extract->>Extract: filter cards, extract number/name, validate (Pydantic), build per-set Polars DF
  end
  Extract-->>Scheduler: concatenated pricing DataFrame

  Scheduler->>Load: run load_pricing_data (RetryPolicy)
  Load->>DB: replace into staging.pricing_data
  alt DB error (OperationalError)
    DB-->>Load: error
    Load->>Scheduler: log and re-raise
  else Success
    DB-->>Load: ack
    Load-->>Scheduler: success
  end
Loading
sequenceDiagram
  autonumber
  participant dbt as dbt CLI
  participant DTrans as CustomDbtTranslator
  participant Dagster as Dagster Asset Layer

  dbt->>DTrans: provide dbt_resource_props
  alt resource_type == "source" and name mapped
    DTrans-->>Dagster: return custom AssetKey([mapped_name])
  else
    DTrans-->>Dagster: delegate to base translator (default AssetKey)
  end
  Dagster-->>dbt: asset mapping applied
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

In a burrow I nibble through code and carrots bright,
Prices hop in by day and retries guard the night.
Dbt maps the paths, constraints fall then mend,
Eggs now show up where Go and structs extend.
I thump my foot — pipelines purr — carrots in sight! 🥕🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title Check ❓ Inconclusive The title “1.7.1” is purely a version number and does not describe any of the substantive changes introduced in this pull request, making it too vague for teammates to understand the primary updates at a glance. Please update the title to a concise, descriptive statement of the main changes—such as “Add pricing data extraction and loading assets with retry policies, rename extract and load functions to explicit asset names, and support egg group display in the CLI”—so that reviewers can immediately grasp the pull request’s purpose.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 1.7.1

Comment @coderabbitai help to get the list of available commands and usage tips.

@digitalghost-dev digitalghost-dev linked an issue Oct 12, 2025 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Oct 12, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
cmd/pokemon/pokemon.go 86.70% <100.00%> (+0.90%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
card_data/pipelines/defs/extract/extract_data.py (2)

49-75: Add timeout/raise_for_status and error handling for set fetches

Network calls should be bounded and explicit about failures.

Apply:

-    for url in url_list:
-        data = requests.get(url).json()
+    for url in url_list:
+        try:
+            r = requests.get(url, timeout=10)
+            r.raise_for_status()
+            data = r.json()
+        except requests.RequestException as e:
+            raise RuntimeError(f"Failed to fetch {url}") from e

116-136: Wire asset input instead of calling upstream function directly

Use AssetIn to inject extract_card_url_from_set_data; avoid recomputation and keep lineage accurate.

Apply:

-@dg.asset(deps=[extract_card_url_from_set], kinds={"API"}, name="extract_card_info")
-def extract_card_info() -> list:
-    card_url_list = extract_card_url_from_set()
+@dg.asset(
+    kinds={"API"},
+    name="extract_card_info",
+    ins={"card_url_list": dg.AssetIn(key=dg.AssetKey(["extract_card_url_from_set_data"]))},
+)
+def extract_card_info(card_url_list: list) -> list:
@@
-            r = requests.get(url)
+            r = requests.get(url, timeout=10)
🧹 Nitpick comments (8)
card_data/pipelines/poke_cli_dbt/models/cards.sql (1)

6-7: Avoid fragile mixed‑case identifiers; alias to consistent snake_case

Quoted mixed‑case ("localId", "set_cardCount_official") can break if the staging table has lowercase columns. Standardize via aliases.

Apply:

-SELECT id, set_id, image, name, "localId", category, hp, "set_cardCount_official", set_name
+SELECT
+  id,
+  set_id,
+  image,
+  name,
+  "localId" as local_id,
+  category,
+  hp,
+  "set_cardCount_official" as set_cardcount_official,
+  set_name

Please verify staging.cards has these exact-case source columns before relying on quotes.

card_data/pipelines/defs/transformation/transform_data.py (1)

7-25: Harden translator against missing keys; keep mapping flexible

Use .get(...) to avoid KeyError and future‑proof against dbt metadata changes. Optionally consider (source_name, name) if multiple sources share table names.

Apply:

-class CustomDbtTranslator(DagsterDbtTranslator):
-    def get_asset_key(self, dbt_resource_props):
-
-        resource_type = dbt_resource_props["resource_type"]
-        name = dbt_resource_props["name"]
+class CustomDbtTranslator(DagsterDbtTranslator):
+    def get_asset_key(self, dbt_resource_props):
+        resource_type = dbt_resource_props.get("resource_type")
+        name = dbt_resource_props.get("name")
+        source_name = dbt_resource_props.get("source_name")
@@
-            if name in source_mapping:
+            if name and name in source_mapping:
                 return dg.AssetKey([source_mapping[name]])
 
         # For models, use default behavior
         return super().get_asset_key(dbt_resource_props)
card_data/pipelines/defs/extract/extract_data.py (2)

90-114: Use a timeout for requests; keep 0.1s sleep

Bound the call to avoid hanging runs.

Apply:

-            r = requests.get(url)
+            r = requests.get(url, timeout=10)

135-178: Also inject cards_list into create_card_dataframe

Mirror the pattern so Dagster passes data via inputs; remove direct function call.

Apply:

-@dg.asset(deps=[extract_card_info], kinds={"Polars"}, name="create_card_dataframe")
-def create_card_dataframe() -> pl.DataFrame:
-    cards_list = extract_card_info()
+@dg.asset(
+    kinds={"Polars"},
+    name="create_card_dataframe",
+    ins={"cards_list": dg.AssetIn(key=dg.AssetKey(["extract_card_info"]))},
+)
+def create_card_dataframe(cards_list: list) -> pl.DataFrame:
card_data/pipelines/defs/load/load_data.py (1)

36-41: Raise a specific exception for Soda failures; include stdout/stderr

Prefer CalledProcessError over bare Exception; carries command/output.

Apply:

-    if result.returncode != 0:
-        raise Exception(f"Soda data quality checks failed with return code {result.returncode}")
+    if result.returncode != 0:
+        raise subprocess.CalledProcessError(
+            returncode=result.returncode,
+            cmd=result.args,
+            output=result.stdout,
+            stderr=result.stderr,
+        )

Also applies to: 66-68

card_data/pipelines/soda/checks.yml (1)

3-3: Make row_count check resilient: change row_count = 3 to row_count >= 3. Soda config uses schema: staging, so series targets staging.series.

cmd/pokemon/pokemon.go (1)

101-111: Consider handling empty egg groups.

The eggGroup function correctly collects, capitalizes, sorts, and formats egg group data. However, if pokemonSpeciesStruct.EggGroups is empty, the function will still print a line like "• Egg Group(s): " with no groups listed.

Consider adding an early return if the egg groups slice is empty:

 eggGroup := func(w io.Writer) {
+	if len(pokemonSpeciesStruct.EggGroups) == 0 {
+		return
+	}
 	var eggGroupSlice []string
 
 	for _, entry := range pokemonSpeciesStruct.EggGroups {
card_data/pipelines/defs/extract/extract_pricing_data.py (1)

10-13: Consider externalizing the set-to-product mapping.

The hardcoded SET_PRODUCT_MATCHING dictionary will require code changes whenever new sets are added. Consider moving this to a configuration file or database table for easier maintenance.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 91fc942 and 91c59e2.

📒 Files selected for processing (26)
  • card_data/pipelines/defs/extract/extract_data.py (4 hunks)
  • card_data/pipelines/defs/extract/extract_pricing_data.py (1 hunks)
  • card_data/pipelines/defs/load/load_data.py (6 hunks)
  • card_data/pipelines/defs/load/load_pricing_data.py (1 hunks)
  • card_data/pipelines/defs/transformation/transform_data.py (1 hunks)
  • card_data/pipelines/poke_cli_dbt/macros/create_relationships.sql (1 hunks)
  • card_data/pipelines/poke_cli_dbt/models/cards.sql (1 hunks)
  • card_data/pipelines/poke_cli_dbt/models/pricing_data.sql (1 hunks)
  • card_data/pipelines/poke_cli_dbt/models/sources.yml (1 hunks)
  • card_data/pipelines/soda/checks.yml (1 hunks)
  • card_data/tcg_pricing.py (0 hunks)
  • cmd/pokemon/pokemon.go (3 hunks)
  • docs/Infrastructure_Guide/aws.md (1 hunks)
  • docs/Infrastructure_Guide/supabase.md (1 hunks)
  • docs/Infrastructure_Guide/terraform.md (1 hunks)
  • structs/structs.go (1 hunks)
  • testdata/main_latest_flag.golden (1 hunks)
  • testdata/pokemon_abilities.golden (1 hunks)
  • testdata/pokemon_defense.golden (1 hunks)
  • testdata/pokemon_defense_ability_immunities.golden (1 hunks)
  • testdata/pokemon_image.golden (1 hunks)
  • testdata/pokemon_image_flag_non-valid_size.golden (1 hunks)
  • testdata/pokemon_no_flags_dual_type.golden (1 hunks)
  • testdata/pokemon_regional_form.golden (1 hunks)
  • testdata/pokemon_regional_form_2.golden (1 hunks)
  • testdata/pokemon_stats.golden (1 hunks)
💤 Files with no reviewable changes (1)
  • card_data/tcg_pricing.py
🧰 Additional context used
🧬 Code graph analysis (3)
cmd/pokemon/pokemon.go (1)
styling/styling.go (1)
  • ColoredBullet (18-20)
card_data/pipelines/defs/load/load_pricing_data.py (2)
card_data/pipelines/defs/extract/extract_pricing_data.py (1)
  • build_dataframe (94-110)
card_data/pipelines/utils/secret_retriever.py (1)
  • fetch_secret (8-18)
card_data/pipelines/defs/load/load_data.py (1)
card_data/pipelines/defs/extract/extract_data.py (3)
  • extract_series_data (30-46)
  • extract_set_data (50-87)
  • create_card_dataframe (136-178)
🪛 markdownlint-cli2 (0.18.1)
docs/Infrastructure_Guide/terraform.md

69-69: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)

🪛 Ruff (0.13.3)
card_data/pipelines/defs/load/load_data.py

67-67: Create your own exception

(TRY002)


67-67: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: tests
🔇 Additional comments (22)
card_data/pipelines/defs/load/load_data.py (2)

15-21: Explicit asset name and retry policy look good

Clear naming + exponential backoff improves resilience.


70-75: LGTM on explicit naming and retry for set load

Consistent with series/card assets.

card_data/pipelines/defs/extract/extract_data.py (1)

29-46: Add timeout and status check to extract_series_data’s HTTP call
Wrap the request as:

r = requests.get(url, timeout=10)
r.raise_for_status()
data = r.json()

Confirm the API’s series list includes "me" so the filter returns three entries.

testdata/main_latest_flag.golden (1)

4-4: LGTM! Version update aligns with the release.

The version bump from v1.6.2 to v1.7.0 is appropriate for this feature release that adds egg group display functionality.

testdata/pokemon_image.golden (1)

5-13: LGTM! Egg group data correctly added.

The egg group information is properly formatted and logically placed after the evolution information. The format "• Egg Group(s): Ground" is consistent with other Pokémon attribute displays.

testdata/pokemon_stats.golden (1)

4-12: LGTM! Consistent egg group formatting.

The egg group display follows the same format and placement as other test fixtures, maintaining consistency across the CLI output.

testdata/pokemon_regional_form_2.golden (1)

4-12: LGTM! Multiple egg groups properly formatted.

Multiple egg groups are correctly displayed with comma separation ("Monster, Water1"), following the established format.

testdata/pokemon_image_flag_non-valid_size.golden (1)

4-12: LGTM! Egg group data appropriately included in error scenarios.

The egg group information is correctly displayed even when the image rendering fails, which is appropriate since egg groups are independent of image display functionality.

testdata/pokemon_abilities.golden (1)

4-12: LGTM! Consistent formatting maintained.

The egg group information follows the established format and placement pattern.

testdata/pokemon_no_flags_dual_type.golden (1)

4-12: LGTM! Special egg group case properly handled.

The "No-Eggs" egg group for legendary Pokémon is correctly displayed, and the placement appropriately adapts for Pokémon without evolution information.

testdata/pokemon_defense.golden (1)

4-12: LGTM – egg group formatting is consistent across all testdata golden files.

card_data/pipelines/poke_cli_dbt/models/sources.yml (1)

97-107: LGTM!

The new pricing_data source block is well-structured and consistent with existing source definitions. The column descriptions adequately document the pricing data schema.

testdata/pokemon_regional_form.golden (1)

12-12: LGTM!

The egg group display line is correctly formatted and consistent with the new feature additions across other test golden files.

testdata/pokemon_defense_ability_immunities.golden (1)

12-12: LGTM!

The egg group display correctly shows multiple groups in a comma-separated format, consistent with the expected output format.

structs/structs.go (1)

174-178: LGTM!

The EggGroups field addition properly maps the egg group data from the PokéAPI response, enabling the new egg group display feature in the CLI.

cmd/pokemon/pokemon.go (2)

10-10: LGTM!

The sort import is appropriately added to support sorting egg group names alphabetically.


172-187: LGTM!

The buffer declarations and output assembly correctly integrate the egg group data into the final Pokémon details display.

card_data/pipelines/defs/load/load_pricing_data.py (1)

9-13: LGTM!

The Dagster asset configuration with retry policy (3 retries, exponential backoff) is appropriate for handling transient database connection issues during data loading.

card_data/pipelines/defs/extract/extract_pricing_data.py (3)

16-20: LGTM!

The CardPricing Pydantic model provides appropriate validation for pricing data with the correct field types and optional market_price.


23-41: LGTM!

The helper functions correctly handle card identification, number extraction, and name cleaning. The logic is clear and well-documented.


93-110: LGTM! Strong safeguard against data loss.

The empty DataFrame check (lines 100-104) is an excellent safeguard to prevent accidental data deletion when using if_table_exists="replace" in the load step.

card_data/pipelines/poke_cli_dbt/models/pricing_data.sql (1)

1-7: Approve code changesenable_rls() macro confirmed in card_data/pipelines/poke_cli_dbt/macros/create_rls.sql.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
card_data/pipelines/defs/load/load_pricing_data.py (1)

24-25: Past issue resolved: error message now correctly references the function name.

The error message on line 25 now correctly references load_pricing_data() instead of load_series_data(), addressing the previous review feedback.

🧹 Nitpick comments (1)
card_data/pipelines/defs/load/load_pricing_data.py (1)

19-26: Consider broader error handling for additional failure modes.

The current error handling only catches OperationalError. Consider catching a broader exception type to handle other potential failures such as network timeouts, authentication errors, or unexpected database states.

Example refactor:

     try:
         df.write_database(
             table_name=table_name, connection=database_url, if_table_exists="replace"
         )
         print(colored(" ✓", "green"), f"Data loaded into {table_name}")
-    except OperationalError as e:
-        print(colored(" ✖", "red"), "Connection error in load_pricing_data():", e)
+    except OperationalError as e:
+        print(colored(" ✖", "red"), "Database connection error in load_pricing_data():", e)
+        raise
+    except Exception as e:
+        print(colored(" ✖", "red"), "Unexpected error in load_pricing_data():", e)
         raise

Note: The asset-level RetryPolicy will still retry on any exception, so this is primarily for improved logging clarity.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 91c59e2 and 2338167.

📒 Files selected for processing (1)
  • card_data/pipelines/defs/load/load_pricing_data.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
card_data/pipelines/defs/load/load_pricing_data.py (2)
card_data/pipelines/defs/extract/extract_pricing_data.py (1)
  • build_dataframe (94-110)
card_data/pipelines/utils/secret_retriever.py (1)
  • fetch_secret (8-18)
🔇 Additional comments (1)
card_data/pipelines/defs/load/load_pricing_data.py (1)

1-6: LGTM!

All imports are necessary and correctly referenced in the code.

@digitalghost-dev digitalghost-dev merged commit 072dc2a into main Oct 12, 2025
7 checks passed
@digitalghost-dev digitalghost-dev deleted the 1.7.1 branch October 12, 2025 21:06
@coderabbitai coderabbitai bot mentioned this pull request Oct 19, 2025
@coderabbitai coderabbitai bot mentioned this pull request Nov 3, 2025
@coderabbitai coderabbitai bot mentioned this pull request Nov 25, 2025
@coderabbitai coderabbitai bot mentioned this pull request Dec 7, 2025
@coderabbitai coderabbitai bot mentioned this pull request Dec 15, 2025
@coderabbitai coderabbitai bot mentioned this pull request Dec 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Pokémon's egg group

2 participants