diff --git a/README.md b/README.md index c6babc3a..9e2e6d94 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ Good news that you do not have to label all images (draw bounding boxes) from s Please refer to this blog post that describes Active Learning and semi-automated flow: [Active Learning for Object Detection in Partnership with Conservation Metrics](https://www.microsoft.com/developerblog/2018/11/06/active-learning-for-object-detection/) We will use Transfer Learning and Active Learning as core Machine Learning components of the pipeline. - -- Transfer Learning: use powerful pre-trained on big dataset (COCO) model as a startining point for fine-tuning foe needed classes. + -- Transfer Learning: use powerful pre-trained on big dataset (COCO) model as a starting point for fine-tuning foe needed classes. -- Active Learning: human annotator labels small set of images (set1), trains Object Detection Model (model1) on this set1 and then uses model1 to predict bounding boxes on images (thus pre-labeling those). Human annotator reviews mode1's predictions where the model was less confident -- and thus comes up with new set of images -- set2. Next phase will be to train more powerful model2 on bigger train set that includes set1 and set2 and use model2 prediction results as draft of labeled set3… The plan is to have 2 versions of pipeline set-up. @@ -17,7 +17,7 @@ This one (ideally) includes minimum setup. The core components here are: It will also be used to save "progress" logs of labeling activities 2) "Tagger" machine(s) This is computer(s) that human annotator(s) is using as environment for labeling portion of images -- for example [VOTT](https://github.com/Microsoft/VoTT). -Here example of labeling flow in VOTT: I've labled wood "knots" (round shapes) and "defect" (pretty much non-round shaped type of defect): +Here example of labeling flow in VOTT: I've labeled wood "knots" (round shapes) and "defect" (pretty much non-round shaped type of defect): ![Labeling](images/VOTT_knot_defect.PNG) @@ -40,12 +40,12 @@ The flow below assumes the following: 1) We use Tensorflow Object Detection API (Faster RCNN with Resnet 50 as default option) to fine tune object detection. 2) Tensorflow Object Detection API is setup on Linux box (Azure DSVM is an option) that you can ssh to. See docs for Tensorflow Object Detection API regarding its general config. 3) Data(images) is in Azure blob storage -4) Human annotators use [VOTT](https://github.com/Microsoft/VoTT) to label\revise images. To support another tagging tool it's output (boudin boxes) need to be converted to csv form -- pull requests are welcomed! +4) Human annotators use [VOTT](https://github.com/Microsoft/VoTT) to label\revise images. To support another tagging tool it's output (bounding boxes) need to be converted to csv form -- pull requests are welcomed! Here is general flow has 2 steps: 1) Environments setup -2) Active Learnining cycle: labeling data and running scipts to update model and feed back results for human annotator to review. -The whole flow is currenly automated with **4 scrips** user needs to run. +2) Active Learning cycle: labeling data and running scripts to update model and feed back results for human annotator to review. +The whole flow is currently automated with **4 scrips** user needs to run. ### General prep @@ -54,30 +54,21 @@ The whole flow is currenly automated with **4 scrips** user needs to run. ### On Linux box aka Model (re)training env -1) Setup Tensorflow Object Detection API if you have not already. -This will include cloning of https://github.com/tensorflow/models. (On my machine I have it cloned to `/home/olgali/repos/models`). - Run `research/object_detection/object_detection_tutorial.ipynb` to make sure Tensorflow Object Detection API is functioning. -2) Clone this repo to the machine (for example: `/home/olgali/repos/models/research/active-learning-detect/`) -3) Update _config.ini_: - - set values for _AZURE_STORAGE_ACCOUNT_ and _AZURE_STORAGE_KEY_ - - set (update if needed) values for _# Data Information_ section - - set values for _# Training Machine_ and _# Tensorflow_ sections of the config.ini - _"python_file_directory"_ config value should point to the _"train"_ scripts from this project. -Example: -`python_file_directory=/home/olgali/repos/models/research/active-learning-detect/train` -3) pip install azure-blob packages: azure.storage.blob +Run the devops/dsvm/deploy_dsvm.sh script to create a VM for this process. follow the instructions [here]() ### Tagger machine(s) (could be same as Linux box or separate boxes\vms) -1) Have Python 3.6 up and running. -2) Pip install azure-blob packages: azure.storage.blob +1) Have Python 3.6+ up and running +TODO: add section for installing python +If you do not have python 3.6+ download [anaconda python](https://www.anaconda.com/distribution/#download-section) +``` +python --version + +python -m pip install azure.storage.blob +``` 3) Clone this repo, copy updated config.ini from Model re-training box (as it has Azure Blob Storage and other generic info already). -4) Update _config.ini_ values for _# Tagger Machine_ section: +4) Update _config.ini_ values for _# Tagger Machine_ section: This is a temporary directory, and the process will delete all the + files every time you label so do not use an existing dir that you care about `tagging_location=D:\temp\NewTag` - -# Label data, run the scripts! -Overview: you will run **4 scripts* in total: -- two scipts on the machine where model (re)training happens and -- two scripts where human annotators label images (or review images pre-labeled by the model). ### On Linux box aka Model (re)training env Run bash script to Init pipeline @@ -87,28 +78,14 @@ This step will: - Create totag_xyz.csv on the blob storage ( "activelearninglabels" container by default). This is the snapshot of images file names that need tagging (labeling). As human annotators make progress on labeling data the list will get smaller and smaller. -### Tagger machine(s) -1) Make sure that the tagging_location is empty. -2) Start each "phase" with downloading images to label (or to review pre-labeled images). -Sample cmd below requests 40 images for tagging: -`D:\repo\active-learning-detect\tag>python download_vott_json.py 40 ..\config.ini` -This step will create new version of totag_xyz.csv on blob storage that will have 40 images excluded from the list. -File tagging_abc.csv will hold list of 40 images being tagged. -3) Start [VOTT](https://github.com/Microsoft/VoTT) , load the folder for labeling\review (in my case it will be `D:\temp\NewTag\images`) -4) Once done with labeling push results back to central storage: - `D:\repo\active-learning-detect\tag>python upload_vott_json.py ..\config.ini` -This step will push tagged_123.csv to blob storage: this file contains actual bounding boxes coordinates for every image. -Tagging_abc.csv will contain list of files that are "work in progress" -- the ones to be tagged soon. - - -Now model can be trained. +### Label Now model can be trained. ### Model(re)training on Linux box Before your first time running the model, and at any later time if you would like to repartition the test set, run: `~/repos/models/research/active-learning-detect/train$ . ./repartition_test_set_script.sh ../config.ini` -This script will take all the tagged data and split some of it into a test set, which will not be trained/validated on and will then be use by evalution code to return mAP values. +This script will take all the tagged data and split some of it into a test set, which will not be trained/validated on and will then be use by evaluation code to return mAP values. Run bash script: `~/repos/models/research/active-learning-detect/train$ . ./active_learning_train.sh ../config.ini` @@ -127,36 +104,35 @@ Human annotator(s) deletes any leftovers from previous predictions (csv files in Training cycle can now be repeated on bigger training set and dataset with higher quality of pre-labeled bounding boxes could be obtained. +# Running prediciton on a new batch of data (existing model) -# Using Custom Vision service for training +Send the config to the remote machine (make sure it is updated with the model that you want to use). +``` +scp "config_usgs19_pred20191021.ini" cmi@13.77.159.88:/home/cmi/active-learning-detect +``` -The Custom Vision service can be used instead of Tensorflow in case you do not have access to an Azure Data Science VM or other GPU-enabled machine. The steps for Custom Vision are pretty similar to those for Tensorflow, although the training step is slightly different: +SSH into the machine -### Model (re)training on Custom Vision -If you would like to repartition the test set, run: +``` +cmi@13.77.159.88 +``` -`~/repos/models/research/active-learning-detect/train$ . ./repartition_test_set_script.sh ../config.ini` - -This script will take all the tagged data and split some of it into a test set, which will not be trained/validated on and will then be use by evalution code to return mAP values. +Remove the old training directory to avoid running prediction on the last batch of images you uploaded. -To train the model: -python cv_train.py ../config.ini +``` +rm -rd data +``` -This python script will train a custom vision model based on available labeled data. +change directory to "train" -Model will evaluated on test set and perf numbers will be saved in blob storage (performance.csv). - -Latest totag.csv will have predictions for all available images made of the newly trained model -- bounding box locations that could be used by human annotator as a starter. +``` +cd active-learning-detect/train +``` +Run the prediction script with the config that you just uploaded -# Sample dataset -I'm using wood knots dataset mentioned in this [blog](http://blog.revolutionanalytics.com/2017/09/wood-knots.html) -Here is [link](https://olgaliakrepo.blob.core.windows.net/woodknots/board_images_png.zip) to the dataset: zip file with 800+ board png images. - -# Custom Vision HttpOperationError 'Bad Request' +active_learning_predict_no_train.sh uses the info in the config file to download the images from blob storage, download the model file from blob storage, and then run prediction on these images. -The current custom vision SDK is in preview mode, and one of the limitations is that an error while training does not return an error message, just a generic 'Bad Request' response. Common reasons for this error include: -1) Having a tag with less than 15 images. Custom Vision requires a minimum of 15 images per tag and will throw an error if it finds any tag with less than that many. -2) Having a tag out of bounds. If for some reason you attempt to add a tag through the API which is out of bounds, it will accept the request but will throw an error while training. -3) No new images since last training session. If you try to train without adding additional images Custom Vision will return a bad request exception. -The best way to debug these is to go into the Custom Vision website (customvision.ai) and click the train button, which should then tell you what the error was. +``` +sh active_learning_predict_no_train.sh ../config_usgs19_pred20191021.ini +``` diff --git a/config.ini b/config.ini index e507e099..d569c5d7 100644 --- a/config.ini +++ b/config.ini @@ -3,6 +3,10 @@ AZURE_STORAGE_ACCOUNT= AZURE_STORAGE_KEY= image_container_name=activelearningimages label_container_name=activelearninglabels +pred_model_name=None +pred_dir=None +min_tile_size=250 +max_tile_size=500 # IMAGE INFORMATION user_folders=True classes=knots,defect diff --git a/config_devtest.ini b/config_devtest.ini new file mode 100644 index 00000000..319a04cd --- /dev/null +++ b/config_devtest.ini @@ -0,0 +1,58 @@ +# AZURE STORAGE ACCOUNT INFORMATION +AZURE_STORAGE_ACCOUNT=usgsaerialimages +AZURE_STORAGE_KEY=NEED-KEY-HERE +image_container_name=devtest-images +label_container_name=devtest-labels +pred_model_name=model_1593140460.pb +pred_dir=None +min_tile_size=600 +max_tile_size=1024 +# IMAGE INFORMATION +user_folders=True +classes=dark_bird,marine_mammal,dark_bird_f,light_bird_f,light_bird,trash +filetype=*.JPG +# TRAINING MACHINE +# Locationss +python_file_directory=/home/cmi/active-learning-detect/train +data_dir=~/data/usgs_2019pred +train_dir=~/data/usgs_2019pred/training +inference_output_dir=~/data/usgs_2019pred/training/usgs_2019_inference_graphs +tf_models_location=/home/cmi/repos/models/research +download_location=/home/cmi/downloads +# Training +train_iterations=20000 +eval_iterations=100 +min_confidence=.01 +test_percentage=.15 +model_name=faster_rcnn_inception_v2_coco_2018_01_28 +optional_pipeline_url=https://raw.githubusercontent.com/tensorflow/models/master/research/object_detection/samples/configs/faster_rcnn_inception_v2_coco.config +#Init Predictions +init_model_name=faster_rcnn_resnet101_coco_2018_01_28 +# Config File Details +old_label_path=PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt +old_train_path=PATH_TO_BE_CONFIGURED/mscoco_train.record-?????-of-00100 +old_val_path=PATH_TO_BE_CONFIGURED/mscoco_val.record-?????-of-00010 +old_checkpoint_path=PATH_TO_BE_CONFIGURED/model.ckpt +num_examples_marker=num_examples: +num_steps_marker=num_steps: +num_classes_marker=num_classes: +# Calculated +num_classes="$(awk -F ',' '{print NF}' <<< ${classes})" +image_dir=${data_dir}/AllImages +untagged_output=${data_dir}/untagged.csv +tagged_output=${data_dir}/tagged.csv +tagged_predictions=${data_dir}/tagged_preds.csv +test_output=${data_dir}/test.csv +validation_output=${data_dir}/val.csv +tf_location=${tf_models_location}/object_detection +tf_location_legacy=${tf_models_location}/object_detection/legacy +PYTHONPATH=$PYTHONPATH:${tf_models_location}:${tf_models_location}/slim/ +label_map_path=${data_dir}/pascal_label_map.pbtxt +tf_record_location=${data_dir}/stamps.record +tf_train_record=${tf_record_location%.*}_train.${tf_record_location##*.} +tf_val_record=${tf_record_location%.*}_val.${tf_record_location##*.} +tf_url=http://download.tensorflow.org/models/object_detection/${model_name}.tar.gz +pipeline_file=${download_location}/${model_name}/pipeline.config +fine_tune_checkpoint=${download_location}/${model_name}/model.ckpt +init_pred_tf_url=http://download.tensorflow.org/models/object_detection/${init_model_name}.tar.gz +init_model_graph=${download_location}/${init_model_name}/frozen_inference_graph.pb diff --git a/config_oikonos_mega.ini b/config_oikonos_mega.ini new file mode 100644 index 00000000..d5709614 --- /dev/null +++ b/config_oikonos_mega.ini @@ -0,0 +1,70 @@ +# AZURE STORAGE ACCOUNT INFORMATION +AZURE_STORAGE_ACCOUNT=oikonos +AZURE_STORAGE_KEY="put key here" +image_container_name=oikonos-mega-images +label_container_name=oikonos-mega-labels +# IMAGE INFORMATION +user_folders=True +classes=cat,dog,rabbit,rat,shearwater,Unknown,zorzal,petrel,kestrel,pigeon,human,goat,mouse,coati,cow,horse,pig,owl +# Provide preferred distribution of images-review ratio. +# Last value corresponds to images were no object were detected. +# In the example below: 60% of images that use will be reviewing have at least one bbox with objct class1 (knot), +# 30% images that have bboxes for class (defects) +# and 10% of images get class "NULL" -- were neither knots nor defects were detected by the model +ideal_class_balance=0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 +filetype=*.JPG +# TAGGING MACHINE +tagging_location=C:\Users\ConservationMetrics\Desktop\oikonos_tagging +pick_max=False +max_tags_per_pixel=1 +# +# CUSTOM VISION +# Uncomment lines below if using Azure Custom Vision Service training_key= +# prediction_key= +# project_id= +# +# TRAINING MACHINE +# Locations +python_file_directory=/home/cmi/repos/models/research/active-learning-detect/train +data_dir=/home/cmi/data/oikonosmega +train_dir=/home/cmi/data/oikonosmega/training +inference_output_dir=/home/cmi/data/oikonosmega/training/oikonosmega_inference_graphs +tf_models_location=/home/cmi/repos/models/research +download_location=/home/cmi/downloads +# Training +train_iterations=100 +eval_iterations=100 +min_confidence=.1 +test_percentage=.15 +model_name=megadetector_v3 +optional_pipeline_url=https://lilablobssc.blob.core.windows.net/models/camera_traps/megadetector/megadetector_v3.config +#Init Predictions +init_model_name=faster_rcnn_resnet101_coco_2018_01_28 +# Config File Details +old_label_path=/megadetectorv3/PyCharm/CameraTraps/detection/experiments/megadetector_v3/label_map.pbtxt +old_train_path=/disk/megadetectorv3_tfrecords/???????~train-?????-of-????? +old_val_path=/disk/megadetectorv3_tfrecords/???????~val__-?????-of-????? +old_checkpoint_path=PATH_TO_BE_CONFIGURED/model.ckpt +num_examples_marker=num_examples: +num_steps_marker=num_steps: +num_classes_marker=num_classes: +# Calculated +num_classes="$(awk -F ',' '{print NF}' <<< ${classes})" +image_dir=${data_dir}/AllImages +untagged_output=${data_dir}/untagged.csv +tagged_output=${data_dir}/tagged.csv +tagged_predictions=${data_dir}/tagged_preds.csv +test_output=${data_dir}/test.csv +validation_output=${data_dir}/val.csv +tf_location=${tf_models_location}/object_detection +tf_location_legacy=${tf_models_location}/object_detection/legacy +PYTHONPATH=$PYTHONPATH:${tf_models_location}:${tf_models_location}/slim/ +label_map_path=${data_dir}/pascal_label_map.pbtxt +tf_record_location=${data_dir}/stamps.record +tf_train_record=${tf_record_location%.*}_train.${tf_record_location##*.} +tf_val_record=${tf_record_location%.*}_val.${tf_record_location##*.} +tf_url=https://lilablobssc.blob.core.windows.net/models/camera_traps/megadetector/${model_name}_checkpoint.zip +pipeline_file=${download_location}/${model_name}/pipeline.config +fine_tune_checkpoint=${download_location}/${model_name}/${model_name}_checkpoint/model.ckpt +init_pred_tf_url=http://download.tensorflow.org/models/object_detection/${init_model_name}.tar.gz +init_model_graph=${download_location}/${init_model_name}/frozen_inference_graph.pb diff --git a/devops/dsvm/README.md b/devops/dsvm/README.md index 5fc5e74c..26f4eb74 100644 --- a/devops/dsvm/README.md +++ b/devops/dsvm/README.md @@ -1,47 +1,156 @@ -# Setting up an Azure DSVM for Active Learning +# Setting up an Azure DSVM for Training and Prediction -This document will explain how to deploy an Azure DSVM and set up the environment for Active Learning. +This document will explain how to deploy an Azure DSVM and set up the environment for Active Learning. Everything here should be run using GitBash which is an emulator for a unix based system allowing us to run the bash scripts (.sh) that our friends at Microsoft wrote. + +## Check that you have Azure CLI installed and that you are logged in +Note that the Azure CLI is required for this work flow. +The Azure CLI is a command line tool to interact with Azure. + +Follow the instruction to install using the MSI installer, not the command line tool to install. +Install [here](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli) if needed. + +``` +# Check that it is installed: +az.cmd --version + +# Make sure you are logged in to the Azure CLI +az.cmd login + +``` ## Deployment +This will walk you though the steps needed to create a VM. Before we do we must create an SSH key that we will feed to the VM to be able to access it later. +### Create a ssh key Create an SSH Key on your local machine. The following will create a key in your ~/.ssh/act-learn-key location. +If you already have an SSH key that you want to use, you can skip this step. ```sh $ ssh-keygen -f ~/.ssh/act-learn-key -t rsa -b 2048 ``` -Secondly edit the environment variables in the [dsvm_config.sh](config/dsvm_config.sh) script with your own values. For instance: +Then start a instance of ssh-agent (key management software that runs in the background on your machine). +We require that your SSH key be added to the SSH agent. To add your SSH key to the SSH agent use the **_ssh-add_** command + +``` +eval "$(ssh-agent)" +ssh-add -k ~/.ssh/act-learn-key +``` + +### Set up the DSVM config file +Edit the environment variables in the [dsvm_config.sh](config/dsvm_config.sh) script with your own values, and save a copy with for the project you are working on in Dropbox or in the config folder. +For instance:
 RESOURCE_GROUP=MyAzureResourceGroup
-# VM config
-VM_SKU=Standard_NC6 #Make sure VM SKU is available in your resource group's region 
+LOCATION=westus2
+VM_SKU=Standard_NC6_Promo # Must be a NC series machine for GPU computing. Make sure VM SKU is available in your resource group's region 
 VM_IMAGE=microsoft-ads:linux-data-science-vm-ubuntu:linuxdsvmubuntu:latest
-VM_DNS_NAME=mytestdns
-VM_NAME=myvmname
-VM_ADMIN_USER=johndoe
-VM_SSH_KEY=~/.ssh/act-learn-key.pub
+VM_NAME=myvmname # give it a unique VM name
+VM_DNS_NAME=mytestdns # give it a unique DNS name
+VM_ADMIN_USER=cmi
+VM_SSH_KEY=/c/Users/ConservationMetrics/.ssh/act-learn-key.pub
 
-Lastly execute the deploy_dsvm.sh with your edited config file as a parameter. Note that the Azure CLI is required. Install [here](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli) if needed. +### Deploy and launch the VM +Lastly execute the deploy_dsvm.sh with your edited config file as a parameter. -```sh -$ sh deploy_dsvm.sh config/dsvm_config.sh ``` +#change directory to the git repo you cloned +cd /D/CM,Inc/git_repos/active-learning-detect/devops/dsvm -## Environment Setup -We provide a module that will copy over a shell script to your DSVM and execute the shell script to setup an active learning environment. +# run the bash script (.sh) with your dsvm_config +sh deploy_dsvm.sh config/dsvm_config.sh -We require that your SSH key be added to the SSH agent. To add your SSH key to the SSH agent use the **_ssh-add_** command +``` + +### Environment Setup +We will need two GitBash consols for this section. One to ssh into the new VM and a second to use `scp` to send files to the VM from your local computer. `scp` is a commandline tool that sends files over ssh +#### For Windows +If on windows, run the setup-tensorflow.sh script on the VM over ssh. + +Make sure the EOL (end of line) format is unix (option in the edit menu of notpad++) + +Note that in the host argument **_admin_**@127.0.0.1 section is the DSVM Admin name and admin@**_127.0.0.1_** is the IP address of the DSVM. -```sh -$ ssh-add -K ~/.ssh/act-learn-key ``` +# this command will connect over ssh and run the `sh` (bash) the arguments after the "<" +ssh -i "/c/Users/ConservationMetrics/.ssh2/act-learn-key" cmi@52.191.140.253 "sh" < "/D/CM,Inc/git_repos/active-learning-detect/devops/dsvm/setup-tensorflow.sh" -To copy and execute the shell script on the DSVM use the following command +``` -```sh -$ python setup-tensorflow.py --host admin@127.0.0.1 -k ~/.ssh/act-learn-key -s setup-tensorflow.sh +Check the output for errors. There are a few that happen and willnot effect things listed below: + +in the section installing python packages: +> -ERROR: mxnet-model-server 1.0.1 requires model-archiver, which is not installed. + +## Initialize a new project + +### Edit the AL config file +You need to edit the AL config.ini file that you will be using for your new project. There is an example [here](https://github.com/abfleishman/active-learning-detect/blob/master/config.ini) and description of many of the parameters [here](https://github.com/abfleishman/active-learning-detect/blob/master/config_description.md) but there might be some required things missing from both (recent additions). +### Send your AL config to the VM +In your second git bash consol, start an ssh agent, add the key, and then send your edited AL config to the VM (not the DSVM_config): ``` +eval "$(ssh-agent)" +ssh-add -k ~/.ssh/act-learn-key-test + +# scp copies a file to a remote computer into a folder (after the ":") on that computer +scp "/d/CM,Inc/git_repos/active-learning-detect/config_devtest.ini" cmi@52.191.140.253:/home/cmi/repos/active-learning-detect +``` + +### initialize the project +Connect via SSH, initialize your active learning project by changing directory + +here we are exicuting the command `sh ./active-learning-detect/train/active_learning_initialize.sh ...` on the remote machine. +``` +ssh cmi@52.191.140.253 + +cd repos/active-learning-detect/train + +sh ./active_learning_initialize.sh ../config_inception.ini +``` +#### NOT WORKING +The intention is that this would be able to run the script without being in an ssh session but it is not working. +``` +ssh cmi@52.191.140.253 "cd ./repos/active-learning-detect/train&&sh ./active_learning_initialize.sh ../config_inception.ini" +``` +After you do that you can switch to the R cmiimagetools workflow for labeling. Once you have labeled a few images (100) then we can train a model + + +# Train model +ssh back into your VM and train a model. remember to set you model training paramters in your config and if you change them localliy `scp` them back up tot he remote machine +``` +ssh cmi@40.65.119.87 +cd active-learning-detect/train +sh active_learning_train.sh ../config_usgs19_inception.ini + +``` + +# Predict on new images without training +``` +ssh cmi@40.65.119.87 + +cd active-learning-detect/train +sh active_learning_predict_no_train.sh ../config_usgs19_pred20191021.ini +``` + + +# below here may not work so skip + +# set up Remote Desktop +``` +sudo apt-get update +sudo apt-get install xfce4 + +sudo apt-get install xrdp=0.6.1-2 +sudo systemctl enable xrdp + +echo xfce4-session >~/.xsession + +sudo service xrdp restart +sudo passwd cmi +sudo service xrdp restart +``` +### not on VM ### +`az.cmd vm open-port --resource-group oikonos --name gpu --port 3389` -Note that in the host argument **_admin_**@127.0.0.1 section is the DSVM Admin name and admin@**_127.0.0.1_** is the IP address of the DSVM. diff --git a/devops/dsvm/config/dsvm_config.sh b/devops/dsvm/config/dsvm_config.sh index 3b266898..f0be5873 100644 --- a/devops/dsvm/config/dsvm_config.sh +++ b/devops/dsvm/config/dsvm_config.sh @@ -1,12 +1,13 @@ #!/bin/bash # System config -RESOURCE_GROUP=jmsrg1 +RESOURCE_GROUP=cmitestnowgroup +LOCATION=westus2 # VM config -VM_SKU=Standard_NC6 +VM_SKU=Standard_NC6_Promo VM_IMAGE=microsoft-ads:linux-data-science-vm-ubuntu:linuxdsvmubuntu:latest -VM_DNS_NAME=jmsactlrnvm -VM_NAME=jmsactlrnvm -VM_ADMIN_USER=vmadmin -VM_SSH_KEY=~/.ssh/act-learn-key.pub \ No newline at end of file +VM_DNS_NAME=gputestnowdsn +VM_NAME=gputestnow +VM_ADMIN_USER=cmi +VM_SSH_KEY=/c/Users/ConservationMetrics/.ssh/act-learn-key.pub diff --git a/devops/dsvm/deploy_dsvm.sh b/devops/dsvm/deploy_dsvm.sh index e43d8cd1..0a3161bf 100755 --- a/devops/dsvm/deploy_dsvm.sh +++ b/devops/dsvm/deploy_dsvm.sh @@ -21,7 +21,7 @@ fi . $1 # Check and see if Azure CLI is present -az --version > /dev/null +az.cmd --version > /dev/null if [ "$?" -ne "0" ]; then echo "Unable to find azure CLI" exit 1 @@ -34,20 +34,24 @@ if [ ! -e "$VM_SSH_KEY" ]; then fi # Does the resource group exist -RESOURCE_GROUP_PRESENT=`az group exists --name $RESOURCE_GROUP` -if [ "$RESROUCE_GROUP_PRESENT" == "false" ]; then - echo "Resource group does not exist -- $RESOURCE_GROUP" - exit 1 +RESOURCE_GROUP_PRESENT=`az.cmd group exists --name $RESOURCE_GROUP` +echo "$RESOURCE_GROUP exists? $RESOURCE_GROUP_PRESENT" +if [ "$RESOURCE_GROUP_PRESENT" == "false" ]; then + echo "Resource group does not exist -- $RESOURCE_GROUP creating..." + az.cmd group create \ + --name $RESOURCE_GROUP \ + --location $LOCATION fi - -az vm create \ +echo "Creating VM $VM_NAME in $RESOURCE_GROUP in the region $LOCATION" +az.cmd vm create \ --resource-group $RESOURCE_GROUP \ --name $VM_NAME \ --admin-username $VM_ADMIN_USER \ --public-ip-address-dns-name $VM_DNS_NAME \ --image $VM_IMAGE \ --size $VM_SKU \ - --ssh-key-value $VM_SSH_KEY + --ssh-key-value $VM_SSH_KEY \ + --location $LOCATION if [ "$?" -ne "0" ]; then echo "Unable to provision DSVM" exit 1 diff --git a/devops/dsvm/setup-tensorflow.sh b/devops/dsvm/setup-tensorflow.sh index e1425f6d..c4360c0d 100644 --- a/devops/dsvm/setup-tensorflow.sh +++ b/devops/dsvm/setup-tensorflow.sh @@ -3,7 +3,7 @@ #This script automates the instructions from here: #https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md # - +# #Fail on first error set -e #Suppress expanding variables before printing. @@ -13,11 +13,16 @@ set +v #When executing on a DSVM over SSH some paths for pip, cp, make, etc may not be in the path, export PATH=/anaconda/envs/py35/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/opt/caffe/build/install/bin/:/usr/local/cuda/bin:/dsvm/tools/cntk/cntk/bin:/usr/local/cuda/bin:/dsvm/tools/cntk/cntk/bin:/dsvm/tools/spark/current/bin:/opt/mssql-tools/bin:/bin +echo -e '\n*******\tClone ALD \t*******\n' + +git clone https://github.com/abfleishman/active-learning-detect repos/active-learning-detect + echo -e '\n*******\tClone Tensorflow Models\t*******\n' git clone https://github.com/tensorflow/models.git repos/models +cd repos/models/ && git checkout fe748d4a4a1576b57c279014ac0ceb47344399c4 . && cd ../.. echo -e '\n*******\tInstall Tensorflow package\t*******\n' -cd repos/models/ && pip install tensorflow-gpu +cd repos/models/ && pip install tensorflow-gpu==1.13.1 echo -e '\n*******\tInstall COCO API\t*******\n' cd ~/ @@ -39,12 +44,17 @@ export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim echo -e '\n*******\tRunning Object Detection Tests\t******\n' python object_detection/builders/model_builder_test.py -echo -e '\n*******\tClone Active Learning\t*******\n' -git clone https://github.com/CatalystCode/active-learning-detect +# echo -e '\n*******\tClone Active Learning\t*******\n' +# pwd +# cd ~/ +# cd repos/ +# pwd +# git clone https://github.com/abfleishman/active-learning-detect echo -e '\n*******\tInstalling Python Packages\t*******\n' -cd repos/models/research/active-learning-detect +cd ~/ +cd repos/active-learning-detect pip install -r requirements.txt -#Update the config.ini file at repos/models/research/active-learning-detect +# Update the config.ini file at repos/models/research/active-learning-detect echo -e 'Objection dectection install validation complete' \ No newline at end of file diff --git a/faster_rcnn_inception_v2_coco.config b/faster_rcnn_inception_v2_coco.config new file mode 100644 index 00000000..87838a54 --- /dev/null +++ b/faster_rcnn_inception_v2_coco.config @@ -0,0 +1,143 @@ +# Faster R-CNN with Inception v2, configuration for MSCOCO Dataset. +# Users should configure the fine_tune_checkpoint field in the train config as +# well as the label_map_path and input_path fields in the train_input_reader and +# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that +# should be configured. + + +model { + faster_rcnn { + num_classes: 90 + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 250 + max_dimension: 500 + } + } + feature_extractor { + type: 'faster_rcnn_inception_v2' + first_stage_features_stride: 16 + } + first_stage_anchor_generator { + grid_anchor_generator { + scales: [0.25, 0.5, 1.0, 2.0] + aspect_ratios: [0.5, 1.0, 2.0] + height_stride: 16 + width_stride: 16 + } + } + first_stage_box_predictor_conv_hyperparams { + op: CONV + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.01 + } + } + } + first_stage_nms_score_threshold: 0.0 + first_stage_nms_iou_threshold: 0.7 + first_stage_max_proposals: 300 + first_stage_localization_loss_weight: 2.0 + first_stage_objectness_loss_weight: 1.0 + initial_crop_size: 14 + maxpool_kernel_size: 2 + maxpool_stride: 2 + second_stage_box_predictor { + mask_rcnn_box_predictor { + use_dropout: false + dropout_keep_probability: 1.0 + fc_hyperparams { + op: FC + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + variance_scaling_initializer { + factor: 1.0 + uniform: true + mode: FAN_AVG + } + } + } + } + } + second_stage_post_processing { + batch_non_max_suppression { + score_threshold: 0.0 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 300 + } + score_converter: SOFTMAX + } + second_stage_localization_loss_weight: 2.0 + second_stage_classification_loss_weight: 1.0 + } +} + +train_config: { + batch_size: 1 + optimizer { + momentum_optimizer: { + learning_rate: { + manual_step_learning_rate { + initial_learning_rate: 0.0002 + schedule { + step: 500 + learning_rate: .0001 + } + schedule { + step: 5000 + learning_rate: .00001 + } + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + gradient_clipping_by_norm: 10.0 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt" + from_detection_checkpoint: true + # Note: The below line limits the training process to 200K steps, which we + # empirically found to be sufficient enough to train the COCO dataset. This + # effectively bypasses the learning rate schedule (the learning rate will + # never decay). Remove the below line to train indefinitely. + num_steps: 10000 + data_augmentation_options { + random_horizontal_flip { + } + random_vertical_flip { + } + } +} + +train_input_reader: { + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record-?????-of-00100" + } + label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt" +} + +eval_config: { + num_examples: 8000 + # Note: The below line limits the evaluation process to 10 evaluations. + # Remove the below line to evaluate indefinitely. + max_evals: 10 +} + +eval_input_reader: { + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record-?????-of-00010" + } + label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt" + shuffle: false + num_readers: 1 +} diff --git a/test/Images_source.json b/test/Images_source.json index be3b3bd1..e5029dfc 100644 --- a/test/Images_source.json +++ b/test/Images_source.json @@ -1 +1,2 @@ {"framerate": "1", "frames": {"st1026.png": [{"height": 512.0, "id": 1, "name": 1, "tags": ["knot"], "type": "Rectangle", "width": 488.0, "x1": 144, "x2": 174, "y1": 205, "y2": 254}, {"height": 512.0, "id": 2, "name": 2, "tags": ["knot"], "type": "Rectangle", "width": 488.0, "x1": 142, "x2": 183, "y1": 213, "y2": 248}, {"height": 512.0, "id": 3, "name": 3, "tags": ["knot"], "type": "Rectangle", "width": 488.0, "x1": 337, "x2": 361, "y1": 172, "y2": 202}], "st1578.png": [{"height": 512.0, "id": 1, "name": 1, "tags": ["knot"], "type": "Rectangle", "width": 488.0, "x1": 293, "x2": 330, "y1": 188, "y2": 223}, {"height": 512.0, "id": 2, "name": 2, "tags": ["knot"], "type": "Rectangle", "width": 488.0, "x1": 265, "x2": 293, "y1": 401, "y2": 438}], "st1611.png": [{"height": 512.0, "id": 1, "name": 1, "tags": ["knot"], "type": "Rectangle", "width": 488.0, "x1": 317, "x2": 348, "y1": 440, "y2": 494}, {"height": 512.0, "id": 2, "name": 2, "tags": ["knot"], "type": "Rectangle", "width": 488.0, "x1": 37, "x2": 55, "y1": 170, "y2": 189}], "st1840.png": [{"height": 512.0, "id": 1, "name": 1, "tags": ["knot"], "type": "Rectangle", "width": 488.0, "x1": 292, "x2": 313, "y1": 134, "y2": 164}, {"height": 512.0, "id": 2, "name": 2, "tags": ["knot"], "type": "Rectangle", "width": 488.0, "x1": 354, "x2": 377, "y1": 319, "y2": 342}, {"height": 512.0, "id": 3, "name": 3, "tags": ["knot"], "type": "Rectangle", "width": 488.0, "x1": 60, "x2": 92, "y1": 392, "y2": 423}]}, "inputTags": "knot,defect", "scd": false, "suggestiontype": "track", "tag_colors": ["#e9f1fe", "#f3e9ff"]} + diff --git a/test_images/USGS_AerialImages_2019_R1_sum19_tiled___20190517_02_S_Cam3___20190517_CAM31221_2836_1137.JPG b/test_images/USGS_AerialImages_2019_R1_sum19_tiled___20190517_02_S_Cam3___20190517_CAM31221_2836_1137.JPG new file mode 100644 index 00000000..0588930a Binary files /dev/null and b/test_images/USGS_AerialImages_2019_R1_sum19_tiled___20190517_02_S_Cam3___20190517_CAM31221_2836_1137.JPG differ diff --git a/test_images/USGS_AerialImages_2019_R4_fa_mig19_tiled___20190823_04_FM_Cam1___20190823_CAM11682_0_558.JPG b/test_images/USGS_AerialImages_2019_R4_fa_mig19_tiled___20190823_04_FM_Cam1___20190823_CAM11682_0_558.JPG new file mode 100644 index 00000000..3c5ab43b Binary files /dev/null and b/test_images/USGS_AerialImages_2019_R4_fa_mig19_tiled___20190823_04_FM_Cam1___20190823_CAM11682_0_558.JPG differ diff --git a/test_images/USGS_AerialImages_2019_R4_fa_mig19_tiled___20190824_05_FM_Cam1___20190824_CAM11681_1871_558.JPG b/test_images/USGS_AerialImages_2019_R4_fa_mig19_tiled___20190824_05_FM_Cam1___20190824_CAM11681_1871_558.JPG new file mode 100644 index 00000000..a5279eb6 Binary files /dev/null and b/test_images/USGS_AerialImages_2019_R4_fa_mig19_tiled___20190824_05_FM_Cam1___20190824_CAM11681_1871_558.JPG differ diff --git a/test_images/USGS_AerialImages_2019_R4_fa_mig19_tiled___20190824_05_FM_Cam2___20190824_CAM21640_906_4611.JPG b/test_images/USGS_AerialImages_2019_R4_fa_mig19_tiled___20190824_05_FM_Cam2___20190824_CAM21640_906_4611.JPG new file mode 100644 index 00000000..488d35a0 Binary files /dev/null and b/test_images/USGS_AerialImages_2019_R4_fa_mig19_tiled___20190824_05_FM_Cam2___20190824_CAM21640_906_4611.JPG differ diff --git a/test_images/USGS_AerialImages_2019_R4_fa_mig19_tiled___20190824_05_FM_Cam3___20190824_CAM31683_4766_5190.JPG b/test_images/USGS_AerialImages_2019_R4_fa_mig19_tiled___20190824_05_FM_Cam3___20190824_CAM31683_4766_5190.JPG new file mode 100644 index 00000000..33e24cbb Binary files /dev/null and b/test_images/USGS_AerialImages_2019_R4_fa_mig19_tiled___20190824_05_FM_Cam3___20190824_CAM31683_4766_5190.JPG differ diff --git a/test_images/USGS_AerialImages_2019_R6_fa19_tiled___20190925_02_F_Cam3___20190925_CAM30460_906_2295.JPG b/test_images/USGS_AerialImages_2019_R6_fa19_tiled___20190925_02_F_Cam3___20190925_CAM30460_906_2295.JPG new file mode 100644 index 00000000..4f0f71a6 Binary files /dev/null and b/test_images/USGS_AerialImages_2019_R6_fa19_tiled___20190925_02_F_Cam3___20190925_CAM30460_906_2295.JPG differ diff --git a/test_images/USGS_AerialImages_2019_R6_fa19_tiled___20191002_04_F_Cam3___20191002_CAM33299_0_0.JPG b/test_images/USGS_AerialImages_2019_R6_fa19_tiled___20191002_04_F_Cam3___20191002_CAM33299_0_0.JPG new file mode 100644 index 00000000..e8225029 Binary files /dev/null and b/test_images/USGS_AerialImages_2019_R6_fa19_tiled___20191002_04_F_Cam3___20191002_CAM33299_0_0.JPG differ diff --git a/test_images/USGS_AerialImages_2019_R6_fa19_tiled___20191014_13_F_Cam2___20191014_CAM21097_7661_4611.JPG b/test_images/USGS_AerialImages_2019_R6_fa19_tiled___20191014_13_F_Cam2___20191014_CAM21097_7661_4611.JPG new file mode 100644 index 00000000..8c32a4a6 Binary files /dev/null and b/test_images/USGS_AerialImages_2019_R6_fa19_tiled___20191014_13_F_Cam2___20191014_CAM21097_7661_4611.JPG differ diff --git a/test_images/USGS_AerialImages_2019_R7_w20_tiled___20200104_02_W_Cam3___20200104_CAM34079_5731_2295.JPG b/test_images/USGS_AerialImages_2019_R7_w20_tiled___20200104_02_W_Cam3___20200104_CAM34079_5731_2295.JPG new file mode 100644 index 00000000..e205435b Binary files /dev/null and b/test_images/USGS_AerialImages_2019_R7_w20_tiled___20200104_02_W_Cam3___20200104_CAM34079_5731_2295.JPG differ diff --git a/test_images/USGS_AerialImages_2019_R8_sp_mig20_tiled___20200320_03_SM_Cam3___20200320_CAM31593_906_4032.JPG b/test_images/USGS_AerialImages_2019_R8_sp_mig20_tiled___20200320_03_SM_Cam3___20200320_CAM31593_906_4032.JPG new file mode 100644 index 00000000..8524cd3d Binary files /dev/null and b/test_images/USGS_AerialImages_2019_R8_sp_mig20_tiled___20200320_03_SM_Cam3___20200320_CAM31593_906_4032.JPG differ diff --git a/train/active_learning_eval.sh b/train/active_learning_eval.sh new file mode 100644 index 00000000..be685945 --- /dev/null +++ b/train/active_learning_eval.sh @@ -0,0 +1,57 @@ +#!/bin/bash +# Source environmental variables +set -a +sed -i 's/\r//g' $1 +. $1 +set +a +# Updating vars in config file +envsubst < $1 > cur_config.ini +# Update images from blob storage +# echo "Updating Blob Folder" +# python ${python_file_directory}/update_blob_folder.py cur_config.ini +# # Create TFRecord from images + csv file on blob storage +# echo "Creating TF Record" +# python ${python_file_directory}/convert_tf_record.py cur_config.ini +# # Download tf model if it doesn't exist +# if [ ! -d "$download_location/${model_name}" ]; then +# mkdir -p $download_location +# curl $tf_url --create-dirs -o ${download_location}/${model_name}.tar.gz +# tar -xzf ${download_location}/${model_name}.tar.gz -C $download_location +# fi +# if [ ! -z "$optional_pipeline_url" ]; then +# curl $optional_pipeline_url -o $pipeline_file +# elif [ ! -f $pipeline_file ]; then +# cat "there you go" +# cp ${download_location}/${model_name}/pipeline.config $pipeline_file +# fi +echo "Making pipeline file from env vars" +temp_pipeline=${pipeline_file%.*}_temp.${pipeline_file##*.} +# sed "s/${old_label_path//\//\\/}/${label_map_path//\//\\/}/g" $pipeline_file > $temp_pipeline +# sed -i "s/${old_train_path//\//\\/}/${tf_train_record//\//\\/}/g" $temp_pipeline +# sed -i "s/${old_val_path//\//\\/}/${tf_val_record//\//\\/}/g" $temp_pipeline +# sed -i "s/keep_checkpoint_every_n_hours: 1.0/keep_checkpoint_every_n_hours: 1/" $temp_pipeline +# sed -i "s/${old_checkpoint_path//\//\\/}/${fine_tune_checkpoint//\//\\/}/g" $temp_pipeline +# sed -i "s/keep_checkpoint_every_n_hours: 1.0/keep_checkpoint_every_n_hours: 1/" $temp_pipeline +# sed -i "s/$num_steps_marker[[:space:]]*[[:digit:]]*/$num_steps_marker $train_iterations/g" $temp_pipeline +# sed -i "s/$num_examples_marker[[:space:]]*[[:digit:]]*/$num_examples_marker $eval_iterations/g" $temp_pipeline +# sed -i "s/$num_classes_marker[[:space:]]*[[:digit:]]*/$num_classes_marker $num_classes/g" $temp_pipeline +# Train model on TFRecord +echo "Eval model" +# rm -rf $train_dir +echo $temp_pipeline +python ${tf_location_legacy}/eval.py --eval_dir=$train_dir --pipeline_config_path=$temp_pipeline --logtostderr +# Export inference graph of model +# echo "Exporting inference graph" +# rm -rf $inference_output_dir +# python ${tf_location}/export_inference_graph.py --input_type "image_tensor" --pipeline_config_path "$temp_pipeline" --trained_checkpoint_prefix "${train_dir}/model.ckpt-$train_iterations" --output_directory "$inference_output_dir" +# TODO: Validation on Model, keep track of MAP etc. +# Use inference graph to create predictions on untagged images +# echo "Creating new predictions" +# python ${python_file_directory}/create_predictions.py cur_config.ini +# echo "Calculating performance" +# python ${python_file_directory}/map_validation.py cur_config.ini +# # Rename predictions and inference graph based on timestamp and upload +# echo "Uploading new data" +# az storage blob upload --container-name $label_container_name --file ${inference_output_dir}/frozen_inference_graph.pb --name model_$(date +%s).pb --account-name $AZURE_STORAGE_ACCOUNT --account-key $AZURE_STORAGE_KEY +# az storage blob upload --container-name $label_container_name --file $untagged_output --name totag_$(date +%s).csv --account-name $AZURE_STORAGE_ACCOUNT --account-key $AZURE_STORAGE_KEY +# az storage blob upload --container-name $label_container_name --file $validation_output --name performance_$(date +%s).csv --account-name $AZURE_STORAGE_ACCOUNT --account-key $AZURE_STORAGE_KEY diff --git a/train/active_learning_predict_no_train.sh b/train/active_learning_predict_no_train.sh new file mode 100644 index 00000000..b80ea3a8 --- /dev/null +++ b/train/active_learning_predict_no_train.sh @@ -0,0 +1,15 @@ +#!/bin/bash +# Source environmental variables +set -a +sed -i 's/\r//g' $1 +. $1 +set +a +# Updating vars in config file +envsubst < $1 > cur_config.ini +# Use inference graph to create predictions on untagged images +echo "Creating new predictions" +python ${python_file_directory}/create_predictions.py cur_config.ini +echo "Uploading new data" +# # az storage blob upload --container-name $label_container_name --file ${inference_output_dir}/frozen_inference_graph.pb --name model_$(date +%s).pb --account-name $AZURE_STORAGE_ACCOUNT --account-key $AZURE_STORAGE_KEY +az storage blob upload --container-name $label_container_name --file $untagged_output --name totag_$(date +%s).csv --account-name $AZURE_STORAGE_ACCOUNT --account-key $AZURE_STORAGE_KEY +# az storage blob upload --container-name $label_container_name --file $validation_output --name performance_$(date +%s).csv --account-name $AZURE_STORAGE_ACCOUNT --account-key $AZURE_STORAGE_KEY diff --git a/train/active_learning_train.sh b/train/active_learning_train.sh index 24b276ba..de3a8d07 100755 --- a/train/active_learning_train.sh +++ b/train/active_learning_train.sh @@ -1,6 +1,4 @@ #!/bin/bash -# Fail on first error -set -e # Source environmental variables set -a sed -i 's/\r//g' $1 @@ -8,6 +6,7 @@ sed -i 's/\r//g' $1 set +a # Updating vars in config file envsubst < $1 > cur_config.ini +echo $add_noise # Update images from blob storage echo "Updating Blob Folder" python ${python_file_directory}/update_blob_folder.py cur_config.ini @@ -23,6 +22,7 @@ fi if [ ! -z "$optional_pipeline_url" ]; then curl $optional_pipeline_url -o $pipeline_file elif [ ! -f $pipeline_file ]; then + cat "there you go" cp ${download_location}/${model_name}/pipeline.config $pipeline_file fi echo "Making pipeline file from env vars" @@ -30,26 +30,41 @@ temp_pipeline=${pipeline_file%.*}_temp.${pipeline_file##*.} sed "s/${old_label_path//\//\\/}/${label_map_path//\//\\/}/g" $pipeline_file > $temp_pipeline sed -i "s/${old_train_path//\//\\/}/${tf_train_record//\//\\/}/g" $temp_pipeline sed -i "s/${old_val_path//\//\\/}/${tf_val_record//\//\\/}/g" $temp_pipeline +sed -i "s/keep_checkpoint_every_n_hours: 1.0/keep_checkpoint_every_n_hours: 1/" $temp_pipeline sed -i "s/${old_checkpoint_path//\//\\/}/${fine_tune_checkpoint//\//\\/}/g" $temp_pipeline sed -i "s/$num_steps_marker[[:space:]]*[[:digit:]]*/$num_steps_marker $train_iterations/g" $temp_pipeline sed -i "s/$num_examples_marker[[:space:]]*[[:digit:]]*/$num_examples_marker $eval_iterations/g" $temp_pipeline sed -i "s/$num_classes_marker[[:space:]]*[[:digit:]]*/$num_classes_marker $num_classes/g" $temp_pipeline +sed -i "s/min_dimension:[[:space:]]*[[:digit:]]*/min_dimension: $min_tile_size/g" $temp_pipeline +sed -i "s/max_dimension:[[:space:]]*[[:digit:]]*/max_dimension: $max_tile_size/g" $temp_pipeline +# add data augmentation +test "$add_vertical_flip" == True&& sed -i "s/ data_augmentation_options {/ data_augmentation_options {\n random_vertical_flip{\n }/g" $temp_pipeline +test "$add_horizontal_flip" == True&& sed -i "s/ data_augmentation_options {/ data_augmentation_options {\n random_horizontal_flip{\n }/g" $temp_pipeline +test "$add_crop_pad_image" == True&& sed -i "s/ data_augmentation_options {/ data_augmentation_options {\n random_crop_pad_image {\n }/g" $temp_pipeline +test "$add_noise" == True&& sed -i "s/ data_augmentation_options {/ data_augmentation_options {\n random_pixel_value_scale {\n minval:0.9\n maxval: 1.1\n }/g" $temp_pipeline +echo $temp_pipeline + # Train model on TFRecord echo "Training model" rm -rf $train_dir +echo $temp_pipeline python ${tf_location_legacy}/train.py --train_dir=$train_dir --pipeline_config_path=$temp_pipeline --logtostderr # Export inference graph of model echo "Exporting inference graph" rm -rf $inference_output_dir python ${tf_location}/export_inference_graph.py --input_type "image_tensor" --pipeline_config_path "$temp_pipeline" --trained_checkpoint_prefix "${train_dir}/model.ckpt-$train_iterations" --output_directory "$inference_output_dir" -# TODO: Validation on Model, keep track of MAP etc. +## TODO: Validation on Model, keep track of MAP etc. # Use inference graph to create predictions on untagged images echo "Creating new predictions" python ${python_file_directory}/create_predictions.py cur_config.ini echo "Calculating performance" python ${python_file_directory}/map_validation.py cur_config.ini -# Rename predictions and inference graph based on timestamp and upload +## Rename predictions and inference graph based on timestamp and upload echo "Uploading new data" -az storage blob upload --container-name $label_container_name --file ${inference_output_dir}/frozen_inference_graph.pb --name model_$(date +%s).pb --account-name $AZURE_STORAGE_ACCOUNT --account-key $AZURE_STORAGE_KEY -az storage blob upload --container-name $label_container_name --file $untagged_output --name totag_$(date +%s).csv --account-name $AZURE_STORAGE_ACCOUNT --account-key $AZURE_STORAGE_KEY -az storage blob upload --container-name $label_container_name --file $validation_output --name performance_$(date +%s).csv --account-name $AZURE_STORAGE_ACCOUNT --account-key $AZURE_STORAGE_KEY +timestamp=$(date +%s) +az storage blob upload --container-name $label_container_name --file ${inference_output_dir}/frozen_inference_graph.pb --name model_$timestamp.pb --account-name $AZURE_STORAGE_ACCOUNT --account-key $AZURE_STORAGE_KEY +az storage blob upload --container-name $label_container_name --file $temp_pipeline --name pipeline_$timestamp.config --account-name $AZURE_STORAGE_ACCOUNT --account-key $AZURE_STORAGE_KEY +az storage blob upload --container-name $label_container_name --file $label_map_path --name label_map_$timestamp.pbtxt --account-name $AZURE_STORAGE_ACCOUNT --account-key $AZURE_STORAGE_KEY +az storage blob upload --container-name $label_container_name --file $cur_config --name config_$timestamp.ini --account-name $AZURE_STORAGE_ACCOUNT --account-key $AZURE_STORAGE_KEY +az storage blob upload --container-name $label_container_name --file $untagged_output --name totag_$timestamp.csv --account-name $AZURE_STORAGE_ACCOUNT --account-key $AZURE_STORAGE_KEY +az storage blob upload --container-name $label_container_name --file $validation_output --name performance_$timestamp.csv --account-name $AZURE_STORAGE_ACCOUNT --account-key $AZURE_STORAGE_KEY diff --git a/train/active_learning_train_mega.sh b/train/active_learning_train_mega.sh new file mode 100644 index 00000000..6aaffc87 --- /dev/null +++ b/train/active_learning_train_mega.sh @@ -0,0 +1,61 @@ +#!/bin/bash +# Source environmental variables +set -a +sed -i 's/\r//g' $1 +. $1 +set +a +# Updating vars in config file +envsubst < $1 > cur_config.ini +# Update images from blob storage +# echo "Updating Blob Folder" +# python ${python_file_directory}/update_blob_folder.py cur_config.ini +# # Create TFRecord from images + csv file on blob storage +# echo "Creating TF Record" +# python ${python_file_directory}/convert_tf_record.py cur_config.ini +# # Download tf model if it doesn't exist +if [ ! -d "$download_location/${model_name}" ]; then + mkdir -p $download_location/${model_name} + curl $tf_url --create-dirs -o ${download_location}/${model_name}/${model_name}_checkpoint.zip + echo $download_location/${model_name} + unzip -o ${download_location}/${model_name}/${model_name}_checkpoint.zip -d $download_location/${model_name} + curl https://lilablobssc.blob.core.windows.net/models/camera_traps/megadetector/megadetector_v3.pb --create-dirs -o ${download_location}/${model_name}/${model_name}.pb +fi +if [ ! -z "$optional_pipeline_url" ]; then + curl $optional_pipeline_url -o $pipeline_file +elif [ ! -f $pipeline_file ]; then + cp ${download_location}/${model_name}/pipeline.config $pipeline_file +fi +echo "Making pipeline file from env vars" +temp_pipeline=${pipeline_file%.*}_temp.${pipeline_file##*.} +sed "s/${old_label_path//\//\\/}/${label_map_path//\//\\/}/g" $pipeline_file > $temp_pipeline +sed -i '\ \ gradient_clipping_by_norm:\ 10.0 a\ \ fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"\n \ from_detection_checkpoint: true\n \ load_all_detection_checkpoint_vars: true\n \ num_steps: 15000' $temp_pipeline +sed -i "s/${old_train_path//\//\\/}/${tf_train_record//\//\\/}/g" $temp_pipeline +sed -i "s/${old_val_path//\//\\/}/${tf_val_record//\//\\/}/g" $temp_pipeline +sed -i "s/${old_checkpoint_path//\//\\/}/${fine_tune_checkpoint//\//\\/}/g" $temp_pipeline +sed -i "s/keep_checkpoint_every_n_hours:\ 1.0/keep_checkpoint_every_n_hours:\ 1/g" $temp_pipeline +sed -i "s/$num_steps_marker[[:space:]]*[[:digit:]]*/$num_steps_marker $train_iterations/g" $temp_pipeline +sed -i "s/$num_examples_marker[[:space:]]*[[:digit:]]*/$num_examples_marker $eval_iterations/g" $temp_pipeline +sed -i "s/$num_classes_marker[[:space:]]*[[:digit:]]*/$num_classes_marker $num_classes/g" $temp_pipeline + +# # Train model on TFRecord +echo "Training model" +rm -rf $train_dir +python ${tf_location_legacy}/train.py --train_dir=$train_dir --pipeline_config_path=$temp_pipeline --logtostderr +# Export inference graph of model +echo "Exporting inference graph" +rm -rf $inference_output_dir +echo $temp_pipeline +echo ${train_dir}/model.ckpt-$train_iterations +echo $inference_output_dir +python ${tf_location}/export_inference_graph.py --input_type "image_tensor" --pipeline_config_path "$temp_pipeline" --trained_checkpoint_prefix "${train_dir}/model.ckpt-$train_iterations" --output_directory "$inference_output_dir" +# TODO: Validation on Model, keep track of MAP etc. +# Use inference graph to create predictions on untagged images +echo "Creating new predictions" +python ${python_file_directory}/create_predictions.py cur_config.ini +echo "Calculating performance" +python ${python_file_directory}/map_validation.py cur_config.ini +Rename predictions and inference graph based on timestamp and upload +echo "Uploading new data" +az storage blob upload --container-name $label_container_name --file ${inference_output_dir}/frozen_inference_graph.pb --name model_$(date +%s).pb --account-name $AZURE_STORAGE_ACCOUNT --account-key $AZURE_STORAGE_KEY +az storage blob upload --container-name $label_container_name --file $untagged_output --name totag_$(date +%s).csv --account-name $AZURE_STORAGE_ACCOUNT --account-key $AZURE_STORAGE_KEY +az storage blob upload --container-name $label_container_name --file $validation_output --name performance_$(date +%s).csv --account-name $AZURE_STORAGE_ACCOUNT --account-key $AZURE_STORAGE_KEY diff --git a/train/active_learning_update_label_map.sh b/train/active_learning_update_label_map.sh new file mode 100644 index 00000000..79f1ba47 --- /dev/null +++ b/train/active_learning_update_label_map.sh @@ -0,0 +1,14 @@ +#!/bin/bash +# Source environmental variables +set -a +sed -i 's/\r//g' $1 +. $1 +set +a +# Make all necessary directories +# mkdir -p $image_dir +# Download all images +# az storage blob download-batch --source $image_container_name --destination $image_dir +# Create TFRecord from images + csv file on blob storage +# TODO: Try to import create_predictions into this +envsubst < $1 > cur_config.ini +python ${python_file_directory}/update_label_map.py cur_config.ini diff --git a/train/create_cur_config.sh b/train/create_cur_config.sh new file mode 100644 index 00000000..c0900bb5 --- /dev/null +++ b/train/create_cur_config.sh @@ -0,0 +1,20 @@ +#!/bin/bash +# Source environmental variables +set -a +sed -i 's/\r//g' $1 +. $1 +set +a +# Updating vars in config file +envsubst < $1 > cur_config.ini +# Update images from blob storage +echo "Updating Blob Folder" +python ${python_file_directory}/update_blob_folder.py cur_config.ini +# Use inference graph to create predictions on untagged images +echo "Creating new predictions" +python ${python_file_directory}/create_predictions.py cur_config.ini +echo "Calculating performance" +python ${python_file_directory}/map_validation.py cur_config.ini +# Rename predictions and inference graph based on timestamp and upload +echo "Uploading new data" +az storage blob upload --container-name $label_container_name --file $untagged_output --name totag_$(date +%s).csv --account-name $AZURE_STORAGE_ACCOUNT --account-key $AZURE_STORAGE_KEY +az storage blob upload --container-name $label_container_name --file $validation_output --name performance_$(date +%s).csv --account-name $AZURE_STORAGE_ACCOUNT --account-key $AZURE_STORAGE_KEY diff --git a/train/create_predictions.py b/train/create_predictions.py index c5089595..72e731e3 100644 --- a/train/create_predictions.py +++ b/train/create_predictions.py @@ -6,6 +6,7 @@ import csv from collections import defaultdict import numpy as np +import datetime NUM_CHANNELS=3 FOLDER_LOCATION=8 @@ -23,11 +24,13 @@ YMIN_IDX = 2 YMAX_IDX = 4 +def chunker(seq,size): + return(seq[pos:pos + size] for pos in range(0,len(seq), size)) def calculate_confidence(predictions): return min([float(prediction[0]) for prediction in predictions]) -def make_csv_output(all_predictions: List[List[List[int]]], all_names: List[str], all_sizes: List[Tuple[int]], +def make_csv_output(all_predictions: List[List[List[int]]], all_names: List[str], all_sizes: List[Tuple[int]], untagged_output: str, tagged_output: str, file_set: AbstractSet, user_folders: bool = True): ''' Convert list of Detector class predictions as well as list of image sizes @@ -66,20 +69,22 @@ def make_csv_output(all_predictions: List[List[List[int]]], all_names: List[str] prediction[YMIN_IDX], prediction[YMAX_IDX], height, width, prediction[BOX_CONFID_IDX], confidence]) -def get_suggestions(detector, basedir: str, untagged_output: str, +def get_suggestions(detector, basedir: str, untagged_output: str, tagged_output: str, cur_tagged: str, cur_tagging: str, min_confidence: float =.2, - image_size: Tuple=(1000,750), filetype: str="*.jpg", minibatchsize: int=50, - user_folders: bool=True): + image_size: Tuple=(1024,600), filetype: str="*.jpg", minibatchsize: int=50, + user_folders: bool=True, batch_size=50): '''Gets suggestions from a given detector and uses them to generate VOTT tags - + Function inputs an instance of the Detector class along with a directory, - and optionally a confidence interval, image size, and tag information (name and color). - It returns a list of subfolders in that directory sorted by how confident the + and optionally a confidence interval, image size, and tag information (name and color). + It returns a list of subfolders in that directory sorted by how confident the given Detector was was in predicting bouding boxes on files within that subfolder. It also generates VOTT JSON tags corresponding to the predicted bounding boxes. The optional confidence interval and image size correspond to the matching optional arguments to the Detector class ''' + start= datetime.datetime.now() + print("prediction started: "+start.strftime("%Y-%m-%d %H:%M:%S")) basedir = Path(basedir) CV2_COLOR_LOAD_FLAG = 1 all_predictions = [] @@ -103,21 +108,35 @@ def get_suggestions(detector, basedir: str, untagged_output: str, subdirs = [subfile for subfile in basedir.iterdir() if subfile.is_dir()] print("subdirs: ", subdirs) all_names = [] - all_image_files = [] + all_image_files = [] all_sizes = [] + all_predictions = [] + all_names_temp = [] + all_image_files_temp = [] + all_sizes_temp = [] + all_predictions_temp = [] for subdir in subdirs: - cur_image_names = list(subdir.rglob(filetype)) - print("Total image names: ", len(cur_image_names)) - all_image_files += [str(image_name) for image_name in cur_image_names] - foldername = subdir.stem - all_names += [(foldername, filename.name) for filename in cur_image_names] - # Reversed because numpy is row-major - all_sizes = [cv2.imread(image, CV2_COLOR_LOAD_FLAG).shape[:2] for image in all_image_files] - all_images = np.zeros((len(all_image_files), *reversed(image_size), NUM_CHANNELS), dtype=np.uint8) - for curindex, image in enumerate(all_image_files): - all_images[curindex] = cv2.resize(cv2.imread(image, CV2_COLOR_LOAD_FLAG), image_size) - print("Shape of all_images: ", all_images.shape) - all_predictions = detector.predict(all_images, min_confidence=min_confidence) + all_cur_image_names = list(subdir.rglob(filetype)) + + print(str(subdir)+": Total image names: ", len(all_cur_image_names)) + + for i in range(0, len(all_cur_image_names), batch_size): + cur_image_names = all_cur_image_names[i:i+batch_size] + all_image_files_temp = [str(image_name) for image_name in cur_image_names] + foldername = subdir.stem + all_names_temp = [(foldername, filename.name) for filename in cur_image_names] + all_names += all_names_temp + # Reversed because numpy is row-major + all_sizes_temp = [cv2.imread(image, CV2_COLOR_LOAD_FLAG).shape[:2] for image in all_image_files_temp] + all_sizes += all_sizes_temp + all_images_temp = np.zeros((len(all_image_files_temp), *reversed(image_size), NUM_CHANNELS), dtype=np.uint8) + for curindex, image in enumerate(all_image_files_temp): + all_images_temp[curindex] = cv2.resize(cv2.imread(image, CV2_COLOR_LOAD_FLAG), image_size) + print("Shape of all_images: ", all_images_temp.shape) + # TODO: could put this in a loop + all_predictions_temp = detector.predict(all_images_temp, batch_size=2, min_confidence=min_confidence) + all_predictions += all_predictions_temp + else: with open(cur_tagged, 'r') as file: reader = csv.reader(file) @@ -132,9 +151,17 @@ def get_suggestions(detector, basedir: str, untagged_output: str, all_sizes = [cv2.imread(str(image), CV2_COLOR_LOAD_FLAG).shape[:2] for image in all_image_files] all_images = np.zeros((len(all_image_files), *reversed(image_size), NUM_CHANNELS), dtype=np.uint8) for curindex, image in enumerate(all_image_files): + print("file",curindex,image) all_images[curindex] = cv2.resize(cv2.imread(str(image), CV2_COLOR_LOAD_FLAG), image_size) all_predictions = detector.predict(all_images, batch_size=2, min_confidence=min_confidence) make_csv_output(all_predictions, all_names, all_sizes, untagged_output, tagged_output, already_tagged, user_folders) + end= datetime.datetime.now() + print("prediction end: "+start.strftime("%Y-%m-%d %H:%M:%S")) + print("prediction duration: "+str(end-start)+" | Time per image: "+str((end-start)/len(all_sizes))) + + + + if __name__ == "__main__": from azure.storage.blob import BlockBlobService @@ -142,6 +169,8 @@ def get_suggestions(detector, basedir: str, untagged_output: str, import re import sys import os + from pathlib import Path + # Allow us to import utils config_dir = str(Path.cwd().parent / "utils") if config_dir not in sys.path: @@ -150,9 +179,20 @@ def get_suggestions(detector, basedir: str, untagged_output: str, if len(sys.argv)<2: raise ValueError("Need to specify config file") config_file = Config.parse_file(sys.argv[1]) - image_dir = config_file["image_dir"] - untagged_output = config_file["untagged_output"] - tagged_output = config_file["tagged_predictions"] + # config_file = Config.parse_file(r"D:\CM,Inc\Dropbox (CMI)\CMI_Team\Analysis\2019\PointBlue_Penguins_2019\configs\config_pb2019data500x250_incept10k_gpushark.ini") + + if config_file["pred_dir"] == 'None': + image_dir = re.sub("\$\{data_dir\}", config_file["data_dir"], config_file["image_dir"]) + untagged_output =re.sub("\$\{data_dir\}", config_file["data_dir"], config_file["untagged_output"]) + tagged_output =re.sub("\$\{data_dir\}", config_file["data_dir"], config_file["tagged_predictions"]) + + else: + image_dir = re.sub("\$\{data_dir\}", config_file["data_dir"], config_file["image_dir"])+'/'+config_file["pred_dir"] + untagged_output =re.sub("\$\{data_dir\}", config_file["data_dir"], re.sub("untagged.csv",config_file["untagged_output"],'untagged_' + config_file["pred_dir"] + ".csv") ) + tagged_output =re.sub("\$\{data_dir\}", config_file["data_dir"], re.sub("tagged_preds.csv",config_file["tagged_predictions"],'tagged_preds_' + config_file["pred_dir"] + ".csv") ) + config_file["user_folders"] = "False" + + pred_model_name = config_file["pred_model_name"] block_blob_service = BlockBlobService(account_name=config_file["AZURE_STORAGE_ACCOUNT"], account_key=config_file["AZURE_STORAGE_KEY"]) container_name = config_file["label_container_name"] file_date = [(blob.name, blob.properties.last_modified) for blob in block_blob_service.list_blobs(container_name) if re.match(r'tagged_(.*).csv', blob.name)] @@ -160,6 +200,14 @@ def get_suggestions(detector, basedir: str, untagged_output: str, cur_tagging = None classes = [] model = None + if pred_model_name=="None": + pred_model_name = None + + if config_file["min_tile_size"] == 'None': + tile_size = (1024,600) + else: + tile_size = tuple(map(int, (config_file["max_tile_size"],config_file["min_tile_size"]))) + if len(sys.argv) > 3 and (sys.argv[2].lower() =='init_pred'): print("Using MS COCO pretrained model to detect known 90 classes. For class id <-> name mapping check this file: https://github.com/tensorflow/models/blob/master/research/object_detection/data/mscoco_label_map.pbtxt") model = sys.argv[3] @@ -169,6 +217,12 @@ def get_suggestions(detector, basedir: str, untagged_output: str, else: classes = config_file["classes"].split(",") model = str(Path(config_file["inference_output_dir"])/"frozen_inference_graph.pb") + + if pred_model_name is not None: + print("downloading model from " + pred_model_name) + + Path.mkdir(Path(config_file["inference_output_dir"]),parents=True,exist_ok=True) + block_blob_service.get_blob_to_path(container_name, str(pred_model_name), str(Path(config_file["inference_output_dir"])/"frozen_inference_graph.pb")) if file_date: block_blob_service.get_blob_to_path(container_name, max(file_date, key=lambda x:x[1])[0], "tagged.csv") cur_tagged = "tagged.csv" @@ -178,4 +232,4 @@ def get_suggestions(detector, basedir: str, untagged_output: str, cur_tagging = "tagging.csv" cur_detector = TFDetector(classes, model) - get_suggestions(cur_detector, image_dir, untagged_output, tagged_output, cur_tagged, cur_tagging, filetype=config_file["filetype"], min_confidence=float(config_file["min_confidence"]), user_folders=config_file["user_folders"]=="True") + get_suggestions(cur_detector, image_dir, untagged_output, tagged_output, cur_tagged, cur_tagging, filetype=config_file["filetype"], min_confidence=float(config_file["min_confidence"]), user_folders=config_file["user_folders"]=="True",image_size=tile_size,batch_size=1000) diff --git a/train/initialize_vott_pull.py b/train/initialize_vott_pull.py index 0fcd9ad9..ea17259b 100644 --- a/train/initialize_vott_pull.py +++ b/train/initialize_vott_pull.py @@ -19,10 +19,11 @@ def select_jsons(image_directory, user_folders, classes, csv_filename, map_filen with open(csv_filename, 'w', newline='') as csv_file: csv_writer = csv.writer(csv_file) + if user_folders: csv_writer.writerow(["filename","class","xmin","xmax","ymin","ymax","height","width","folder","box_confidence", "image_confidence"]) for (filename,true_height,true_width),folder in all_images: - csv_writer.writerow([filename,"NULL",0,0,0,0,true_height,true_width,folder,0,0]) + csv_writer.writerow([filename,"NULL",0,0,0,0,true_height,true_width,str(folder).replace(str(image_directory)+"/","",1),0,0]) else: csv_writer.writerow(["filename","class","xmin","xmax","ymin","ymax","height","width","box_confidence", "image_confidence"]) for filename,true_height,true_width in all_images: diff --git a/train/update_label_map.py b/train/update_label_map.py new file mode 100644 index 00000000..acf9bea4 --- /dev/null +++ b/train/update_label_map.py @@ -0,0 +1,53 @@ +import csv +import cv2 +from pathlib import Path +import time +# def extract_data(filename): + # height, width, _ = cv2.imread(str(filename),1).shape + # return filename.name, height, width + +def update_label_map(map_filename, classes): + with open(map_filename, "w") as map_file: + for index, name in enumerate(classes, 1): + map_file.write("item {{\n id: {}\n name: '{}'\n}}".format(index, name)) + +# def select_jsons(image_directory, user_folders, classes, csv_filename, map_filename): + # with open(map_filename, "w") as map_file: + # for index, name in enumerate(classes, 1): + # map_file.write("item {{\n id: {}\n name: '{}'\n}}".format(index, name)) + + # image_directory = Path(image_directory) + # if user_folders: + # all_images = [(extract_data(filename),filename.parent) for filename in image_directory.glob('**/*') if filename.is_file()] + # else: + # all_images = [extract_data(filename) for filename in image_directory.iterdir()] + + # with open(csv_filename, 'w', newline='') as csv_file: + # csv_writer = csv.writer(csv_file) + + # if user_folders: + # csv_writer.writerow(["filename","class","xmin","xmax","ymin","ymax","height","width","folder","box_confidence", "image_confidence"]) + # for (filename,true_height,true_width),folder in all_images: + # csv_writer.writerow([filename,"NULL",0,0,0,0,true_height,true_width,str(folder).replace(str(image_directory)+"/","",1),0,0]) + # else: + # csv_writer.writerow(["filename","class","xmin","xmax","ymin","ymax","height","width","box_confidence", "image_confidence"]) + # for filename,true_height,true_width in all_images: + # csv_writer.writerow([filename,"NULL",0,0,0,0,true_height,true_width,0,0]) + +if __name__ == "__main__": + from azure.storage.blob import BlockBlobService + import sys + import os + # Allow us to import utils + config_dir = str(Path.cwd().parent / "utils") + if config_dir not in sys.path: + sys.path.append(config_dir) + from config import Config + if len(sys.argv)<2: + raise ValueError("Need to specify config file") + config_file = Config.parse_file(sys.argv[1]) + block_blob_service = BlockBlobService(account_name=config_file["AZURE_STORAGE_ACCOUNT"], account_key=config_file["AZURE_STORAGE_KEY"]) + update_label_map(config_file["label_map_path"], config_file["classes"].split(",")) + # container_name = config_file["label_container_name"] + # select_jsons(config_file["image_dir"],config_file["user_folders"]=="True", config_file["classes"].split(","), "totag.csv", config_file["label_map_path"]) + # block_blob_service.create_blob_from_path(container_name, "{}_{}.{}".format("totag",int(time.time() * 1000),"csv"), "totag.csv") diff --git a/utils/blob_utils.py b/utils/blob_utils.py index 3eabf131..e7fb1695 100644 --- a/utils/blob_utils.py +++ b/utils/blob_utils.py @@ -14,4 +14,23 @@ def get_azure_storage_client(config): account_key=config.get("storage_key") ) - return BlobStorage.azure_storage_client \ No newline at end of file + return BlobStorage.azure_storage_client + + +def attempt_get_blob(blob_credentials, blob_name, blob_dest): + if blob_credentials is None: + print ("blob_credentials is None, can not get blob") + return False + blob_service, container_name = blob_credentials + is_successful = False + print("Dest: {0}".format(blob_dest)) + try: + blob_service.get_blob_to_path(container_name, blob_name, blob_dest) + is_successful = True + except: + print("Error when getting blob") + print("Src: {0} {1}".format(container_name, blob_name)) + + + return is_successful +