-
Notifications
You must be signed in to change notification settings - Fork 19
Description
In the scenario where multiple scripts are listed in datapackage.yml there are two options for accessing objects created via scripts earlier in the list:
datapackager_object_read, which is for accessing objects that were run in the same build i.e. both Rmd files are toggled asenabled: yesproject_data_path, which allows loading an.rdafile created in a previous iteration ofpackage_build()
This creates a relationship between the two scripts that requires manual updates when rebuilding package. Assuming the case of two processing scripts, preprocess_A and preprocess_B which generate A.rda and B.rda, respectively. preprocess_B uses the output from preprocess_A,
In the following build scenario, we would use datapackager_object_read:
# Case 1
files:
preprocess_A.Rmd:
enabled: yes
preprocess_B.Rmd:
enabled: yesIn a subsequent build that is of type 2, we have to update preprocess_B to use project_data_path:
# Case 2
files:
preprocess_A.Rmd:
enabled: no
preprocess_B.Rmd:
enabled: yesThere is a certain logic to this update, because it is a change of state in preprocess_B, to no longer be coupled with preprocess_A.
However, if preprocess_A needs to be rerun for some reason, we have to take the following action:
- update datapackager.yml to enable both files
- switch preprocess_B.Rmd to use
datapackager_object_readagain (not especially intuitive)
Wondering if its possible that data objects are always read from the /data/ location, but after any previous scripts have written to that folder? This would enforce that the latest data is always used, while maximizing code portability.