diff --git a/GEE_upload_scripts/README.md b/GEE_upload_scripts/README.md new file mode 100644 index 0000000..5d8aba8 --- /dev/null +++ b/GEE_upload_scripts/README.md @@ -0,0 +1,192 @@ + + +
For uploading data products from large-scale test runs for visualization and inspection in Google Earth Engine (GEE)+ + + + + + + + + + + + +## Features + +* Generates a GEE-compatible GeoTIFF of the data products for RTC-S1 and CSLC-S1 (DSWx-HLS products are already GEE-compatible) +* Uploads each GeoTIFF to a Google Cloud Bucket. + + + +## Contents + +* [Quick Start](#quick-start) +* [Changelog](#changelog) +* [FAQ](#frequently-asked-questions-faq) +* [Contributing Guide](#contributing) +* [License](#license) +* [Support](#support) + +## Quick Start + +This guide provides a quick way to get started with our project. Please see our [docs]([INSERT LINK TO DOCS SITE / WIKI HERE]) for a more comprehensive overview. + +### ${\color{lightblue}Initial \space instruction \space from \space PST}$ +(this section will likely be removed eventually) + +Some things to note about the scripts: +* The main script first finds all sub directories within a specified s3 bucket+prefix. Then, it makes a list of s3 keys that point to the target layer geotifs (e.g. the VH backscatter). It then searches the Google Cloud Storage (GCS) bucket+prefix for cogs that have already been transferred. It compares these lists and creates a list of s3-gcs key pairs that still need to be transferred. Once we have this big list, we iterate through it using multiprocessing and for each pair, download the geotiff (or h5 for cslc), build a GDAL_translate command to translate into a COG of the right format for GEE, and upload to GCS. +* When I ingest the COGs into GEE (after the attached scripts have run completely, I have another set of scripts that performs the ingestion), I need to specify metadata properties for the image. My current workflow just reads these properties from the filename. Most of these are already part of the file name (e.g. polarization, burst ID) but pass direction isn’t so in the upload to GCS step, I read the direction from the file and append A or D to the filename. I do this to save time on the ingestion step where I don’t need to read anything from the COGs themselves. + +The crux of this conversion is accomplished with gdal_translate CLI commands (passed to the terminal using the subprocess module [in the `processRTC` and `processCSLC` functions, respectively]). These commands are: +* RTC: + * `gdal_cmd1 = f'gdal_translate -of GTiff -co NBITS=32 {filepath} {tempfile}'` + * `gdal_cmd2 = f'gdal_translate -of COG -co COMPRESS=DEFLATE -co RESAMPLING=AVERAGE {tempfile} {outfile}'` +* CSLC: + * `cslc_h5_amp = f'DERIVED_SUBDATASET:AMPLITUDE:NETCDF:"{filepath}":/science/SENTINEL1/CSLC/grids/VV'` + * `gdal_cmd = f'gdal_translate -of COG -co COMPRESS=DEFLATE -co RESAMPLING=AVERAGE {cslc_h5_amp} {outfile}'` + +If SDS can run these commands, and include the output with the rest of the product layers, that would be very useful for us. Let me know if you have any questions. + + +Further Questions & Answers: + +1. ${\color{red}Question}$: Looking at the script code (under `if __name__ == __main__:`), it looks like we will need to update the `s3_prefix` and `gcs_prefix` strings. Do we need to create a new folder on GCS, or will it autogenerate one based on our string? It also looks like we should separate the RTC and CSLC products into different directories, or the loops generating the `keyList` entries won’t do the right thing (e.g. pick up the RTC HDF metadata file as potential CSLC products). Does that sound right? + - ${\color{green}Answer}$: Yes, the s3_prefix and gcs_prefix need to be updated when pointing these scripts at new runs. + - ${\color{green}Answer}$: I believe the folder will be auto generated by the GCS prefix you put in. + - ${\color{green}Answer}$: The script assumes every sub-directory within the s3_prefix contains the files associated with a single product of the same type (RTC or CSLC). It will error if it can’t find the file it is looking for (e.g. if the sub-directory is missing the VH polarization geotif), but has error handling and will continue on to the next key pair. I can’t remember if I made it only print out if there was an error or not. + +2. ${\color{red}Question}$: Do we need any special credentials to for uploading the COGs to Google Cloud? I didn’t see anything like that in the script, but want to double-check. + - ${\color{green}Answer}$: You will need to authenticate google command line tools I believe and also have the right privileges for our GCS bucket to write to the bucket. If this is too much of an issue, you can always write these to an AWS s3 bucket instead, and we can handle the transfer (s3 to GCS transfer is very quick and doesn’t require downloading/uploading) + + + +### Requirements + +* RTC-S1 or CSLC-S1 products in an S3 bucket. + + + +### Setup Instructions + +1. Ensure you know how to authenticate the Google Command Line Tools for the script to upload to the Google Cloud Bucket. +2. Identify the correct `s3_prefix` and `gcs_prefix`, and set to the correct values in the script (they're currently hard-coded). +3. Review the Initial Instruction from PST section above, to help understand the script and what to expect when it runs. +4. Try running the script. I expect there will be some trouble-shooting involved. When running the script, keep notes about what else needs to be done to get it working, including changes to the script code itself. + + + +### Run Instructions + +1. [INSERT STEP-BY-STEP RUN INSTRUCTIONS HERE, WITH OPTIONAL SCREENSHOTS] + + + +### Usage Examples + +* [INSERT LIST OF COMMON USAGE EXAMPLES HERE, WITH OPTIONAL SCREENSHOTS] + + + +### Build Instructions (if applicable) + +1. [INSERT STEP-BY-STEP BUILD INSTRUCTIONS HERE, WITH OPTIONAL SCREENSHOTS] + + + +### Test Instructions (if applicable) + +1. [INSERT STEP-BY-STEP TEST INSTRUCTIONS HERE, WITH OPTIONAL SCREENSHOTS] + + + +## Changelog + +See our [CHANGELOG.md](CHANGELOG.md) for a history of our changes. + +See our [releases page]([INSERT LINK TO YOUR RELEASES PAGE]) for our key versioned releases. + + + +## Frequently Asked Questions (FAQ) + + + + + + + + + +## Contributing + +[INSERT LINK TO CONTRIBUTING GUIDE OR FILL INLINE HERE] + + + + +[INSERT LINK TO YOUR CODE_OF_CONDUCT.md OR SHARE TEXT HERE] + + + + +[INSERT LINK TO YOUR GOVERNANCE.md OR SHARE TEXT HERE] + + +## License + +See our: [LICENSE](LICENSE) + + +## Support + +[INSERT CONTACT INFORMATION OR PROFILE LINKS TO MAINTAINERS AMONG COMMITTER LIST] + + + + + + + + + diff --git a/GEE_upload_scripts/run-Convert-CSLCtoCOG.py b/GEE_upload_scripts/run-Convert-CSLCtoCOG.py index e5c1b93..f1062c1 100644 --- a/GEE_upload_scripts/run-Convert-CSLCtoCOG.py +++ b/GEE_upload_scripts/run-Convert-CSLCtoCOG.py @@ -55,7 +55,7 @@ def processCSLC(s3key,gcskey): upload_blob(gcsbucket,outfile,gcskey) shutil.rmtree(f'./temp_{filename}/') -def run_rtc_transfer(keydict): +def run_cslc_transfer(keydict): try: start_time = time.time() s3key = keydict['s3key'] @@ -98,5 +98,5 @@ def run_rtc_transfer(keydict): print(f'{len(keyPairs)} key pairs identified') pool = mp.Pool(4) - pool.map(run_rtc_transfer,keyPairs) - pool.close() \ No newline at end of file + pool.map(run_cslc_transfer,keyPairs) + pool.close() diff --git a/GEE_upload_scripts/run-translate-RTC-multi.py b/GEE_upload_scripts/run-translate-RTC-multi.py index 9245549..fdf4d07 100644 --- a/GEE_upload_scripts/run-translate-RTC-multi.py +++ b/GEE_upload_scripts/run-translate-RTC-multi.py @@ -88,15 +88,15 @@ def run_rtc_transfer(keydict): gcsKeys.append(blob.name.split('.tif')[0][:-2]+'.tif') print(f'{len(gcsKeys)} existing gcs keys found') - #keyPairs = [] - #for key in keyList: - # fname = key.split('/')[-1] - # gcsKey = gcs_prefix+fname - # if gcsKey not in gcsKeys: - # keydict = {'s3key':key,'gcsKey':gcsKey} - # keyPairs.append(keydict) - #print(f'{len(keyPairs)} key pairs identified') - #pool = mp.Pool(mp.cpu_count()) - #run_rtc_transfer(keyPairs[0]) - #pool.map(run_rtc_transfer,keyPairs) - #pool.close() \ No newline at end of file + keyPairs = [] + for key in keyList: + fname = key.split('/')[-1] + gcsKey = gcs_prefix+fname + if gcsKey not in gcsKeys: + keydict = {'s3key':key,'gcsKey':gcsKey} + keyPairs.append(keydict) + print(f'{len(keyPairs)} key pairs identified') + pool = mp.Pool(mp.cpu_count()) + run_rtc_transfer(keyPairs[0]) + pool.map(run_rtc_transfer,keyPairs) + pool.close()