Skip to content

switches to using parquet files that have been extracted for the model#297

Merged
tomjemmett merged 10 commits intomainfrom
change_databricks_loading
Feb 11, 2025
Merged

switches to using parquet files that have been extracted for the model#297
tomjemmett merged 10 commits intomainfrom
change_databricks_loading

Conversation

@tomjemmett
Copy link
Member

No description provided.

@tomjemmett tomjemmett added enhancement New feature or request priority: should We should implement this feature labels Feb 6, 2025
@tomjemmett tomjemmett added this to the v3.2.0 milestone Feb 6, 2025
@tomjemmett tomjemmett requested a review from StatsRhian February 6, 2025 11:05
@tomjemmett tomjemmett self-assigned this Feb 6, 2025
@codecov
Copy link

codecov bot commented Feb 6, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (e85a8dc) to head (205ec0b).
Report is 10 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #297   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           18        18           
  Lines          999       999           
=========================================
  Hits           999       999           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Co-authored-by: Rhian Davies <thetrianglegirl@gmail.com>
@StatsRhian
Copy link
Member

Wasn't expecting you to respond on your day off @tomjemmett!
I'm just running QA checks against my previous runs but it should all be good.

@StatsRhian
Copy link
Member

We're not currently able to run this on dev data. We can run it on v3.1.
It's erroring with a reshape error at the IP activity avoided calculations. YiWen & I are investigating.

Other minor changes which are suggested

  • rename Model_version to Data_version
  • Or grab data version from the params file directly? (v3.1 - > v3.1.0)

Once it's fixed, I intend to compare with a run I did on Thursday afternoon to ensure similar numbers before approving

@yiwen-h
Copy link
Member

yiwen-h commented Feb 7, 2025

Have checked why it's not running on dev data - there are duplicates in the activity_avoidance. Example below. RAP should have been mapped to RAL (successor organisation) - so the RAP rows of activity should have been removed?

Can get round this by doing drop_duplicates but this flags an issue in nhp_data perhaps

image

@yiwen-h
Copy link
Member

yiwen-h commented Feb 10, 2025

Ok so in summary: the notebook runs on v3.1 data. I don't think we can approve the PR until it runs on dev / v3.2 data as well. So we're awaiting the data fix. have raised an issue here: The-Strategy-Unit/nhp_data#43

Co-authored-by: Rhian Davies <thetrianglegirl@gmail.com>
@yiwen-h yiwen-h requested a review from StatsRhian February 11, 2025 11:32
Copy link
Member

@StatsRhian StatsRhian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry forgot one data_version

Co-authored-by: Rhian Davies <thetrianglegirl@gmail.com>
@tomjemmett tomjemmett merged commit af73bde into main Feb 11, 2025
3 checks passed
@tomjemmett tomjemmett deleted the change_databricks_loading branch February 11, 2025 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request priority: should We should implement this feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants