-
Notifications
You must be signed in to change notification settings - Fork 40
Data Storage Protocol
John Brandt edited this page Sep 5, 2020
·
4 revisions
- The projects are stored in a comma separated file with
lat,long,unique_path, andname. - This is loaded into 4-predict and 4-download and indexed by the name or unique path.
- Everything in
raw/*is stored as int16, vianp.trunc(array * 65535).astype(np.int16)because the original reflectance values are int16 and minimal calculations have occured - Everything in
interim/*is float32, vianp.float32(array)because there are still calculations to be done - Everything in
processed/*is int32, vianp.trunc(array * 65535).astype(np.int32) - All calculuations are float32, all tensors are float32, meaning that on loading any array, call np.float32(array), and assert that the array is between -10 and 10.
- Unique_path is created as the
country/admin1/name-uniqueid/ - Local and cloud are separated with a
local_prefixandcloud_prefix
- Currently hickle
- All data in
raw/is persistent - All other data is processed on demand and should be deleted from the respective folders before closing the docker containers
- The
processed/*is int16 sizing but saved as int32, because it is signed - The
hickleprotocol does not seem to allow for streaming to / from s3, so it may be returned topicklein the future