Skip to content

Commit 394117b

Browse files
MaxGhenisclaude
andauthored
Fix: upload datasets to public HuggingFace repo (#280)
* fix: upload datasets to public HuggingFace repo The push workflow uploads to policyengine/policyengine-uk-data-private but policyengine-uk downloads from policyengine/policyengine-uk-data. This means new columns (like highest_education) never reach downstream consumers. Fix by also uploading to the public repo. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: pass version parameter to public repo upload upload_files_to_hf requires a version parameter for tagging. Without it the upload would fail with TypeError at runtime. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 53fad7d commit 394117b

File tree

2 files changed

+19
-1
lines changed

2 files changed

+19
-1
lines changed

changelog_entry.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
- bump: patch
2+
changes:
3+
fixed:
4+
- Upload datasets to public HuggingFace repo (policyengine/policyengine-uk-data) in addition to private repo, so policyengine-uk gets the latest data.

policyengine_uk_data/storage/upload_completed_datasets.py

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
1+
from importlib import metadata
2+
13
from policyengine_uk_data.storage import STORAGE_FOLDER
2-
from policyengine_uk_data.utils.data_upload import upload_data_files
4+
from policyengine_uk_data.utils.data_upload import (
5+
upload_data_files,
6+
upload_files_to_hf,
7+
)
38

49

510
def upload_datasets():
@@ -14,13 +19,22 @@ def upload_datasets():
1419
if not file_path.exists():
1520
raise ValueError(f"File {file_path} does not exist.")
1621

22+
version = metadata.version("policyengine-uk-data")
23+
1724
upload_data_files(
1825
files=dataset_files,
1926
hf_repo_name="policyengine/policyengine-uk-data-private",
2027
hf_repo_type="model",
2128
gcs_bucket_name="policyengine-uk-data-private",
2229
)
2330

31+
# Also upload to the public repo consumed by policyengine-uk
32+
upload_files_to_hf(
33+
files=dataset_files,
34+
version=version,
35+
hf_repo_name="policyengine/policyengine-uk-data",
36+
)
37+
2438

2539
if __name__ == "__main__":
2640
upload_datasets()

0 commit comments

Comments
 (0)