Skip to content

Commit f78f538

Browse files
author
Michael Smit
committed
Always check for a new data file version even if one has been downloaded.
Related to PolicyEngine/issues#350 The existing code is pretty inconsistent in terms of how/when it decides to try to download a data file and we haven't clearly defined the intended behavior. We are prioritizing the simulation API use case in which case we always want to use the most recent version of the data file for a simulation. This change means that if the code specifies a remote data file (either by explicitly giving a url or by defaulting to a country dataset) we will always check for a new version when creating a Simulation object even if we have a local copy.
1 parent af565e4 commit f78f538

File tree

3 files changed

+11
-12
lines changed

3 files changed

+11
-12
lines changed

changelog_entry.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
- bump: patch
2+
changes:
3+
fixed:
4+
- Always look for new data file versions even if we have a local copy of one.

policyengine/simulation.py

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -135,14 +135,13 @@ def _set_data(self):
135135
-1
136136
].split("/", 2)
137137

138-
if not Path(filename).exists():
139-
file_path = download(
140-
filepath=filename,
141-
huggingface_org=hf_org,
142-
huggingface_repo=hf_repo,
143-
gcs_bucket=bucket,
144-
)
145-
filename = str(Path(file_path))
138+
file_path = download(
139+
filepath=filename,
140+
huggingface_org=hf_org,
141+
huggingface_repo=hf_repo,
142+
gcs_bucket=bucket,
143+
)
144+
filename = str(Path(file_path))
146145
if "cps_2023" in filename:
147146
time_period = 2023
148147
else:

policyengine/utils/data_download.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -40,10 +40,6 @@ def download(
4040
except:
4141
logging.info("Failed to download from Hugging Face.")
4242

43-
if Path(filepath).exists():
44-
logging.info(f"File {filepath} already exists. Skipping download.")
45-
return filepath
46-
4743
if data_file.gcs_bucket is not None:
4844
logging.info("Using Google Cloud Storage for download.")
4945
download_file_from_gcs(

0 commit comments

Comments
 (0)