-
Notifications
You must be signed in to change notification settings - Fork 2
1.7.1 #189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.7.1 #189
Conversation
WalkthroughAdds pricing extraction and loading assets, explicit Dagster asset names and retry policies, a custom dbt translator, new/updated dbt models and macro, removes an old pricing script, extends Go CLI and structs to show egg groups, updates golden tests, and expands infrastructure docs. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Scheduler as Dagster
participant Extract as extract_pricing_data.build_dataframe
participant API as External API (tcg)
participant Load as load_pricing_data
participant DB as Postgres/Supabase
Scheduler->>Extract: materialize pricing asset
loop per set in SET_PRODUCT_MATCHING
Extract->>API: GET products/prices
API-->>Extract: JSON payload
Extract->>Extract: filter cards, extract number/name, validate (Pydantic), build per-set Polars DF
end
Extract-->>Scheduler: concatenated pricing DataFrame
Scheduler->>Load: run load_pricing_data (RetryPolicy)
Load->>DB: replace into staging.pricing_data
alt DB error (OperationalError)
DB-->>Load: error
Load->>Scheduler: log and re-raise
else Success
DB-->>Load: ack
Load-->>Scheduler: success
end
sequenceDiagram
autonumber
participant dbt as dbt CLI
participant DTrans as CustomDbtTranslator
participant Dagster as Dagster Asset Layer
dbt->>DTrans: provide dbt_resource_props
alt resource_type == "source" and name mapped
DTrans-->>Dagster: return custom AssetKey([mapped_name])
else
DTrans-->>Dagster: delegate to base translator (default AssetKey)
end
Dagster-->>dbt: asset mapping applied
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
card_data/pipelines/defs/extract/extract_data.py (2)
49-75: Add timeout/raise_for_status and error handling for set fetchesNetwork calls should be bounded and explicit about failures.
Apply:
- for url in url_list: - data = requests.get(url).json() + for url in url_list: + try: + r = requests.get(url, timeout=10) + r.raise_for_status() + data = r.json() + except requests.RequestException as e: + raise RuntimeError(f"Failed to fetch {url}") from e
116-136: Wire asset input instead of calling upstream function directlyUse AssetIn to inject extract_card_url_from_set_data; avoid recomputation and keep lineage accurate.
Apply:
-@dg.asset(deps=[extract_card_url_from_set], kinds={"API"}, name="extract_card_info") -def extract_card_info() -> list: - card_url_list = extract_card_url_from_set() +@dg.asset( + kinds={"API"}, + name="extract_card_info", + ins={"card_url_list": dg.AssetIn(key=dg.AssetKey(["extract_card_url_from_set_data"]))}, +) +def extract_card_info(card_url_list: list) -> list: @@ - r = requests.get(url) + r = requests.get(url, timeout=10)
🧹 Nitpick comments (8)
card_data/pipelines/poke_cli_dbt/models/cards.sql (1)
6-7: Avoid fragile mixed‑case identifiers; alias to consistent snake_caseQuoted mixed‑case ("localId", "set_cardCount_official") can break if the staging table has lowercase columns. Standardize via aliases.
Apply:
-SELECT id, set_id, image, name, "localId", category, hp, "set_cardCount_official", set_name +SELECT + id, + set_id, + image, + name, + "localId" as local_id, + category, + hp, + "set_cardCount_official" as set_cardcount_official, + set_namePlease verify staging.cards has these exact-case source columns before relying on quotes.
card_data/pipelines/defs/transformation/transform_data.py (1)
7-25: Harden translator against missing keys; keep mapping flexibleUse .get(...) to avoid KeyError and future‑proof against dbt metadata changes. Optionally consider (source_name, name) if multiple sources share table names.
Apply:
-class CustomDbtTranslator(DagsterDbtTranslator): - def get_asset_key(self, dbt_resource_props): - - resource_type = dbt_resource_props["resource_type"] - name = dbt_resource_props["name"] +class CustomDbtTranslator(DagsterDbtTranslator): + def get_asset_key(self, dbt_resource_props): + resource_type = dbt_resource_props.get("resource_type") + name = dbt_resource_props.get("name") + source_name = dbt_resource_props.get("source_name") @@ - if name in source_mapping: + if name and name in source_mapping: return dg.AssetKey([source_mapping[name]]) # For models, use default behavior return super().get_asset_key(dbt_resource_props)card_data/pipelines/defs/extract/extract_data.py (2)
90-114: Use a timeout for requests; keep 0.1s sleepBound the call to avoid hanging runs.
Apply:
- r = requests.get(url) + r = requests.get(url, timeout=10)
135-178: Also inject cards_list into create_card_dataframeMirror the pattern so Dagster passes data via inputs; remove direct function call.
Apply:
-@dg.asset(deps=[extract_card_info], kinds={"Polars"}, name="create_card_dataframe") -def create_card_dataframe() -> pl.DataFrame: - cards_list = extract_card_info() +@dg.asset( + kinds={"Polars"}, + name="create_card_dataframe", + ins={"cards_list": dg.AssetIn(key=dg.AssetKey(["extract_card_info"]))}, +) +def create_card_dataframe(cards_list: list) -> pl.DataFrame:card_data/pipelines/defs/load/load_data.py (1)
36-41: Raise a specific exception for Soda failures; include stdout/stderrPrefer CalledProcessError over bare Exception; carries command/output.
Apply:
- if result.returncode != 0: - raise Exception(f"Soda data quality checks failed with return code {result.returncode}") + if result.returncode != 0: + raise subprocess.CalledProcessError( + returncode=result.returncode, + cmd=result.args, + output=result.stdout, + stderr=result.stderr, + )Also applies to: 66-68
card_data/pipelines/soda/checks.yml (1)
3-3: Make row_count check resilient: changerow_count = 3torow_count >= 3. Soda config usesschema: staging, soseriestargetsstaging.series.cmd/pokemon/pokemon.go (1)
101-111: Consider handling empty egg groups.The
eggGroupfunction correctly collects, capitalizes, sorts, and formats egg group data. However, ifpokemonSpeciesStruct.EggGroupsis empty, the function will still print a line like "• Egg Group(s): " with no groups listed.Consider adding an early return if the egg groups slice is empty:
eggGroup := func(w io.Writer) { + if len(pokemonSpeciesStruct.EggGroups) == 0 { + return + } var eggGroupSlice []string for _, entry := range pokemonSpeciesStruct.EggGroups {card_data/pipelines/defs/extract/extract_pricing_data.py (1)
10-13: Consider externalizing the set-to-product mapping.The hardcoded
SET_PRODUCT_MATCHINGdictionary will require code changes whenever new sets are added. Consider moving this to a configuration file or database table for easier maintenance.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (26)
card_data/pipelines/defs/extract/extract_data.py(4 hunks)card_data/pipelines/defs/extract/extract_pricing_data.py(1 hunks)card_data/pipelines/defs/load/load_data.py(6 hunks)card_data/pipelines/defs/load/load_pricing_data.py(1 hunks)card_data/pipelines/defs/transformation/transform_data.py(1 hunks)card_data/pipelines/poke_cli_dbt/macros/create_relationships.sql(1 hunks)card_data/pipelines/poke_cli_dbt/models/cards.sql(1 hunks)card_data/pipelines/poke_cli_dbt/models/pricing_data.sql(1 hunks)card_data/pipelines/poke_cli_dbt/models/sources.yml(1 hunks)card_data/pipelines/soda/checks.yml(1 hunks)card_data/tcg_pricing.py(0 hunks)cmd/pokemon/pokemon.go(3 hunks)docs/Infrastructure_Guide/aws.md(1 hunks)docs/Infrastructure_Guide/supabase.md(1 hunks)docs/Infrastructure_Guide/terraform.md(1 hunks)structs/structs.go(1 hunks)testdata/main_latest_flag.golden(1 hunks)testdata/pokemon_abilities.golden(1 hunks)testdata/pokemon_defense.golden(1 hunks)testdata/pokemon_defense_ability_immunities.golden(1 hunks)testdata/pokemon_image.golden(1 hunks)testdata/pokemon_image_flag_non-valid_size.golden(1 hunks)testdata/pokemon_no_flags_dual_type.golden(1 hunks)testdata/pokemon_regional_form.golden(1 hunks)testdata/pokemon_regional_form_2.golden(1 hunks)testdata/pokemon_stats.golden(1 hunks)
💤 Files with no reviewable changes (1)
- card_data/tcg_pricing.py
🧰 Additional context used
🧬 Code graph analysis (3)
cmd/pokemon/pokemon.go (1)
styling/styling.go (1)
ColoredBullet(18-20)
card_data/pipelines/defs/load/load_pricing_data.py (2)
card_data/pipelines/defs/extract/extract_pricing_data.py (1)
build_dataframe(94-110)card_data/pipelines/utils/secret_retriever.py (1)
fetch_secret(8-18)
card_data/pipelines/defs/load/load_data.py (1)
card_data/pipelines/defs/extract/extract_data.py (3)
extract_series_data(30-46)extract_set_data(50-87)create_card_dataframe(136-178)
🪛 markdownlint-cli2 (0.18.1)
docs/Infrastructure_Guide/terraform.md
69-69: Code block style
Expected: indented; Actual: fenced
(MD046, code-block-style)
🪛 Ruff (0.13.3)
card_data/pipelines/defs/load/load_data.py
67-67: Create your own exception
(TRY002)
67-67: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: tests
🔇 Additional comments (22)
card_data/pipelines/defs/load/load_data.py (2)
15-21: Explicit asset name and retry policy look goodClear naming + exponential backoff improves resilience.
70-75: LGTM on explicit naming and retry for set loadConsistent with series/card assets.
card_data/pipelines/defs/extract/extract_data.py (1)
29-46: Add timeout and status check to extract_series_data’s HTTP call
Wrap the request as:r = requests.get(url, timeout=10) r.raise_for_status() data = r.json()Confirm the API’s series list includes
"me"so the filter returns three entries.testdata/main_latest_flag.golden (1)
4-4: LGTM! Version update aligns with the release.The version bump from v1.6.2 to v1.7.0 is appropriate for this feature release that adds egg group display functionality.
testdata/pokemon_image.golden (1)
5-13: LGTM! Egg group data correctly added.The egg group information is properly formatted and logically placed after the evolution information. The format "• Egg Group(s): Ground" is consistent with other Pokémon attribute displays.
testdata/pokemon_stats.golden (1)
4-12: LGTM! Consistent egg group formatting.The egg group display follows the same format and placement as other test fixtures, maintaining consistency across the CLI output.
testdata/pokemon_regional_form_2.golden (1)
4-12: LGTM! Multiple egg groups properly formatted.Multiple egg groups are correctly displayed with comma separation ("Monster, Water1"), following the established format.
testdata/pokemon_image_flag_non-valid_size.golden (1)
4-12: LGTM! Egg group data appropriately included in error scenarios.The egg group information is correctly displayed even when the image rendering fails, which is appropriate since egg groups are independent of image display functionality.
testdata/pokemon_abilities.golden (1)
4-12: LGTM! Consistent formatting maintained.The egg group information follows the established format and placement pattern.
testdata/pokemon_no_flags_dual_type.golden (1)
4-12: LGTM! Special egg group case properly handled.The "No-Eggs" egg group for legendary Pokémon is correctly displayed, and the placement appropriately adapts for Pokémon without evolution information.
testdata/pokemon_defense.golden (1)
4-12: LGTM – egg group formatting is consistent across all testdata golden files.card_data/pipelines/poke_cli_dbt/models/sources.yml (1)
97-107: LGTM!The new
pricing_datasource block is well-structured and consistent with existing source definitions. The column descriptions adequately document the pricing data schema.testdata/pokemon_regional_form.golden (1)
12-12: LGTM!The egg group display line is correctly formatted and consistent with the new feature additions across other test golden files.
testdata/pokemon_defense_ability_immunities.golden (1)
12-12: LGTM!The egg group display correctly shows multiple groups in a comma-separated format, consistent with the expected output format.
structs/structs.go (1)
174-178: LGTM!The
EggGroupsfield addition properly maps the egg group data from the PokéAPI response, enabling the new egg group display feature in the CLI.cmd/pokemon/pokemon.go (2)
10-10: LGTM!The
sortimport is appropriately added to support sorting egg group names alphabetically.
172-187: LGTM!The buffer declarations and output assembly correctly integrate the egg group data into the final Pokémon details display.
card_data/pipelines/defs/load/load_pricing_data.py (1)
9-13: LGTM!The Dagster asset configuration with retry policy (3 retries, exponential backoff) is appropriate for handling transient database connection issues during data loading.
card_data/pipelines/defs/extract/extract_pricing_data.py (3)
16-20: LGTM!The
CardPricingPydantic model provides appropriate validation for pricing data with the correct field types and optional market_price.
23-41: LGTM!The helper functions correctly handle card identification, number extraction, and name cleaning. The logic is clear and well-documented.
93-110: LGTM! Strong safeguard against data loss.The empty DataFrame check (lines 100-104) is an excellent safeguard to prevent accidental data deletion when using
if_table_exists="replace"in the load step.card_data/pipelines/poke_cli_dbt/models/pricing_data.sql (1)
1-7: Approve code changes –enable_rls()macro confirmed in card_data/pipelines/poke_cli_dbt/macros/create_rls.sql.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
card_data/pipelines/defs/load/load_pricing_data.py (1)
24-25: Past issue resolved: error message now correctly references the function name.The error message on line 25 now correctly references
load_pricing_data()instead ofload_series_data(), addressing the previous review feedback.
🧹 Nitpick comments (1)
card_data/pipelines/defs/load/load_pricing_data.py (1)
19-26: Consider broader error handling for additional failure modes.The current error handling only catches
OperationalError. Consider catching a broader exception type to handle other potential failures such as network timeouts, authentication errors, or unexpected database states.Example refactor:
try: df.write_database( table_name=table_name, connection=database_url, if_table_exists="replace" ) print(colored(" ✓", "green"), f"Data loaded into {table_name}") - except OperationalError as e: - print(colored(" ✖", "red"), "Connection error in load_pricing_data():", e) + except OperationalError as e: + print(colored(" ✖", "red"), "Database connection error in load_pricing_data():", e) + raise + except Exception as e: + print(colored(" ✖", "red"), "Unexpected error in load_pricing_data():", e) raiseNote: The asset-level
RetryPolicywill still retry on any exception, so this is primarily for improved logging clarity.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
card_data/pipelines/defs/load/load_pricing_data.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
card_data/pipelines/defs/load/load_pricing_data.py (2)
card_data/pipelines/defs/extract/extract_pricing_data.py (1)
build_dataframe(94-110)card_data/pipelines/utils/secret_retriever.py (1)
fetch_secret(8-18)
🔇 Additional comments (1)
card_data/pipelines/defs/load/load_pricing_data.py (1)
1-6: LGTM!All imports are necessary and correctly referenced in the code.
Summary by CodeRabbit
New Features
Improvements
Documentation
Tests
Chores