feat: run grammarly + livelabs QA over workshop #1

jasperan · jasperan · commit f6c1eafdd5f2 · 2023-03-06T21:26:00.000+01:00
diff --git a/mask_detection_labeling/explore/explore.md b/mask_detection_labeling/explore/explore.md
@@ -44,4 +44,4 @@ You may now [proceed to the next lab](#next).
 ## Acknowledgements
 
 * **Author** - Nacho Martinez, Data Science Advocate @ Oracle DevRel
-* **Last Updated By/Date** - February 28th, 2023
+* **Last Updated By/Date** - March 6th, 2023
diff --git a/mask_detection_labeling/infra/images/transfer_moba.gif b/mask_detection_labeling/infra/images/transfer_moba.gif
diff --git a/mask_detection_labeling/infra/infra.md b/mask_detection_labeling/infra/infra.md
@@ -14,7 +14,7 @@ First, we'll go into our [OCI Compute panel](https://cloud.oracle.com/compute/in
 
 ![create instance - step 1](./images/create_instance_1.png)
 
-We're going to use a **platform image** from Oracle called **OCI DSVM**. This image contains several tools for data exploration, analysis, modeling and development. It also includes a Jupyter Notebook, a conda environment ready to use and several more things (like Christmas for a Data practicioner).
+We're going to use a **platform image** from Oracle called **OCI DSVM**. This image contains several tools for data exploration, analysis, modeling, and development. It also includes a Jupyter Notebook, a conda environment ready to use, and several more things (like Christmas for a Data practitioner).
 
 ![create instance - step 3](./images/create_instance_3.png)
 
@@ -25,7 +25,7 @@ We can find the platform image by selecting the *Marketplace* button:
 Network settings for the Virtual Machine are very standard. Just make sure to create a new VCN and a new subnet, so that there's no possible way we get any networking issues from other OCI projects you may have.
 ![create instance - step 4](./images/create_instance_4.png)
 
-Finally, we'll let OCI generate a SSH keypair, which we'll use to connect to the machine.
+Finally, we'll let OCI generate an SSH keypair, which we'll use to connect to the machine.
 
 ![create instance - step 5](./images/create_instance_5.png)
 
@@ -35,7 +35,7 @@ To access our instance, let's copy the primary IP address that was allocated to
 
 ![access instance - step 1](./images/access_instance_1.png)
 
-Once we have the IP address, and having previously saved our public-private keypair (which is what we will use to authenticate ourselves to the machine), let's connect through SSH. 
+Once we have the IP address and having previously saved our public-private key pair (which is what we will use to authenticate ourselves to the machine), let's connect through SSH. 
 
 
 ### Linux & macOS Users
@@ -61,7 +61,7 @@ ssh -i /home/user/Downloads/my_key.key opc@192.168.0.1
 
 For Windows users, I recommend using [MobaXterm](https://mobaxterm.mobatek.net/). 
 
-First, set up a new session, making sure to specify the remote host (IP address), username (opc) and the private key's location:
+First, set up a new session, making sure to specify the remote host (IP address), the username (opc), and the private key's location:
 
 ![access instance - mobaxterm](./images/access_instance_mobaxterm.png)
 
@@ -72,7 +72,7 @@ Now, just click on "Quick Connect" and connect:
 
 > **Note**: we will connect to our VM  and start training / augmenting our data with open-source repositories. 
 
-## Task 3: Clone Open-Source Repositories
+## Task 3: Clone Open-Source Repositories 
 
 Once we have connected to our instance, let's download two repositories: YOLOv5 and YOLOv8. You're free to choose either one of them to train and augment our computer vision models, but this guide will show you how to proceed with YOLOv5.
 
@@ -88,10 +88,48 @@ git clone https://github.com/ultralytics/ultralytics.git
 ```
 > **Note**: `git` is another tool that's already installed in the custom image we used to spin up our instance. *YOLOv8 can also be installed directly from pip. More information [in this link.](https://github.com/ultralytics/ultralytics#documentation)
 
+## Task 4: Transfer Dataset
 
-Now that we have cloned our repositories, we're virtually ready to start training. You may now [proceed to the next lab](#next).
+Now that we're connected to the machine, let's move the files from our computer to our OCI Compute Instance.
+
+### For Linux & macOS Users
+
+We can use the _`scp`_ tool to help us transfer files through SSH:
+
+```
+<copy>
+scp -i ~/.ssh/id_rsa.pub FILENAME USER@SERVER:/home/USER/FILENAME
+</copy>
+```
+> **Note**: the _`-r`_ option stands for recursive (we must give it a directory instead of a file name).
+So, in our case, it will be:
+
+```
+<copy>
+scp -i ~/.ssh/id_rsa.pub -r /home/$USER/Downloads/dataset_directory opc@192.168.0.1:/home/$USER/final_directory
+</copy>
+```
+> **Note**: in this case, my OCI Compute Instance IP is 192.168.0.1. `opc` is the username for Oracle Linux distributions, like the one we are using for this case. And the private key shall be the one we used to connect through SSH in the previous task.
+
+
+### For Windows Users 
+
+Use the integrated MobaXterm FTP explorer to transfer files, dropping files from our computer to MobaXterm's explorer, like here but the opposite:
+
+![transfer to moba ftp](./images/transfer_moba.gif)
+
+## Task 5: Install Python Dependencies
+
+Once we have the repositories ready, we need to install dependencies that will allow us to run YOLO code:
+
+```console
+cd /home/$USER/yolov5
+pip install -r /home/$USER/yolov5/requirements.txt
+```
+
+Now that we have cloned our repositories, uploaded our dataset, and have our machine and conda environment ready, we're virtually ready to start training. You may now [proceed to the next lab](#next).
 
 ## Acknowledgements
 
 * **Author** - Nacho Martinez, Data Science Advocate @ Oracle DevRel
-* **Last Updated By/Date** - February 29th, 2023
+* **Last Updated By/Date** - March 6th, 2023
diff --git a/mask_detection_labeling/intro/intro.md b/mask_detection_labeling/intro/intro.md
@@ -17,7 +17,7 @@ Today, we're going to create a *computer vision model*, which will detect differ
 
 ![validation batch - girl](./images/val_batch0_labels.jpg)
 
-> **Note**: as you can see, the little girl on the second row, third column is wearing the mask with their nose showing, which is *incorrect*. We want our custom model to detect cases like these, which are also the hardest to represent, as there are a lot of pictures of people with and without masks, but there aren't as many of people wearing masks incorrectly on the Internet; which causes our dataset to be imbalanced.
+> **Note**: as you can see, the little girl in the second row and third column is wearing the mask with their nose showing, which is *incorrect*. We want our custom model to detect cases like these, which are also the hardest to represent, as there are a lot of pictures of people with and without masks, but there aren't as many pictures of people wearing masks incorrectly on the Internet; which causes our dataset to be imbalanced.
 
 ## Task 1: Final Result
 
@@ -58,4 +58,4 @@ You may now [proceed to the next lab](#next).
 ## Acknowledgements
 
 * **Author** - Nacho Martinez, Data Science Advocate @ Oracle DevRel
-* **Last Updated By/Date** - February 27th, 2023
+* **Last Updated By/Date** - March 6th, 2023
diff --git a/mask_detection_labeling/roboflow/roboflow.md b/mask_detection_labeling/roboflow/roboflow.md
@@ -4,7 +4,7 @@ Estimated Time: 40 minutes
 
 ## Overview
 
-We are going to use **RoboFlow** as our data platform for this workshop. The good thing about RoboFlow is that it eases the process of labeling and extracting data - in some ways, it works as a data store of hundreds, even thousands of projects like mine, with thousands of imagesa each; and all of these projects are **public** (meaning, we can use someone else's data to help us solve the problem).
+We are going to use **RoboFlow** as our data platform for this workshop. The good thing about RoboFlow is that it eases the process of labeling and extracting data - in some ways, it works as a data store of hundreds, even thousands of projects like mine, with thousands of images each; and all of these projects are **public** (meaning, we can use someone else's data to help us solve the problem).
 
 Oh, and RoboFlow is free to use for public projects - I've never had issues with lacking storage space for a project. Shoutout to the RoboFlow team for their continued support.
 
@@ -69,7 +69,7 @@ If we've done everything correctly, we should have downloaded all images from th
 
 ## Task 3: Manipulating Datasets
 
-First, let's make a quick intro on why we have all images from each dataset in one of these three aforementioned directories, and what each of these directories represent:
+First, let's make a quick intro on why we have all images from each dataset in one of these three aforementioned directories, and what each of these directories represents:
 
 - **Train**: The first type of dataset in machine learning is called the *training* dataset. This dataset is extremely important because it is used to train the model by adjusting the weights and biases of neural networks to produce accurate answers based on the inputs provided. If the training dataset is flawed or incomplete, it is very difficult to develop a good working model.
 
@@ -85,7 +85,7 @@ Let's open one of the datasets. You should see this:
 The file that holds all links and values is called `data.yaml`, with a structure like this:
 
 ![data YAML contents](./images/yaml.png)
-> **Note**: _`nc`_ represents the number of classes, with each class name in the _`names`_ list. It also contains a path to each dataset directory (it's recommended to modify these to be absolute paths, not relative ones). Also, if label names are weird or hard to understand like numbers, you can check what each labels mean by visually inspecting the dataset. For example, I looked at some pictures and made sure that the _`PasMasque`_ class actually represented a *lack* of mask, and that other classes were also correctly represented by correct, meaningful labels.
+> **Note**: _`nc`_ represents the number of classes, with each class name in the _`names`_ list. It also contains a path to each dataset directory (it's recommended to modify these to be absolute paths, not relative ones). Also, if label names are weird or hard to understand like numbers, you can check what each label mean by visually inspecting the dataset. For example, I looked at some pictures and made sure that the _`PasMasque`_ class actually represented a *lack* of a mask, and that other classes were also correctly represented by correct, meaningful labels.
 
 We need to modify this YAML file to include the names of the classes that we want, making sure that the order of the labels is also preserved.
 
@@ -114,7 +114,7 @@ Then, I click on my Mask Detection Placement model:
 
 ![Selecting my Computer Vision Model](./images/roboflow_pt_2.png)
 
-Finally, let's upload some new images to include in the model. I will upload my images by importing a YouTube video of myself; but you can use any pictures you have on your phone or computer, just make sure that you get a healthy ratio of images with different mask-wearing states (correctly, incorrectly, no mask at all).
+Finally, let's upload some new images to include in the model. I will upload my images by importing a YouTube video of myself, but you can use any pictures you have on your phone or computer, just make sure that you get a healthy ratio of images with different mask-wearing states (correctly, incorrectly, no mask at all).
 
 We need to be mindful of which **sampling rate** to choose: if we select a sampling rate that's too high, it will cause the dataset to have very similar images, as they will be taken almost one after the other. If the sampling rate is too low, we won't get enough images from the video.
 
@@ -125,7 +125,7 @@ Then, we get the selected frames from the video:
 
 ![Choosing the right sampling rate](./images/roboflow_frames.png)
 
-The last thing to do now is to **annotate** these images. We will annotate using bounding boxes (see below explanation why this is a good annotation method for our problem).
+The last thing to do now is to **annotate** these images. We will annotate using bounding boxes (see an explanation below of why this is a good annotation method for our problem).
 
 We go to the Annotate section in the toolbar:
 
@@ -153,14 +153,14 @@ We repeat this process for every image. Then, we'll choose into which dataset th
 ![choose dataset destination](./images/choose_dataset.png)
 > **Note**: I recommend 80%-10%-10% for training-validation-testing for most cases.
 
-We can now proceed to augmenting our dataset and generating a new version.
+We can now proceed to augment our dataset and generate a new version.
 
 ### Different Annotation Strategies
 
 Depending on the type of problem, you will need to have a different annotation technique. The three most common ones are:
 - Bounding Boxes: they are rectangles that surround an object and specify its position. This method is perfect for our mask placement model.
-- Polygons: this method takes more time than bounding boxes, but increases performance (accuracy), as the model will be trained on data that's been more constrained. You can annotate an image using the traditional method of drawing a bounding box, without using a polygon. This method takes less time for annotators, but results in missing some added performance. Thus, if you have the resources and you have decided polygon annotation is helpful, it is worth going the extra mile.
-- [Smart Polygons](https://blog.roboflow.com/automated-polygon-labeling-computer-vision/): RoboFlow simplifies this process by having their own Smart Annotation, which will detect an object and try to draw its edges interactively.
+- Polygons: this method takes more time than bounding boxes, but increases performance (accuracy), as the model will be trained on data that's been more constrained. You can annotate an image using the traditional method of drawing a bounding box, without using a polygon. This method takes less time for annotators but results in missing some added performance. Thus, if you have the resources and you have decided polygon annotation is helpful, it is worth going the extra mile.
+- [Smart Polygons](https://blog.roboflow.com/automated-polygon-labeling-computer-vision/): RoboFlow simplifies this process by having its own Smart Annotation, which will detect an object and try to draw its edges interactively.
 
 Here are three examples, one for each type of annotation:
 
@@ -215,11 +215,11 @@ Since I had some free training credits, I decided to spend one of them to see ho
 
 ![train types](./images/train_types.png)
 
-I recommend to start training from a public checkpoint, like the one from the COCO dataset, with a 46.7% mAP. This dataset has previously been trained to detect objects from the real-world, and even though it doesn't recognize mask placement, it will serve as a starting point for the Neural Network, which, despite its lack of knowledge of what a 'COVID-19 Mask' is, it's learned to detect other things, like **edges** of objects, shapes and forms. This means that the model _knows_ about the real-world, even if its knowledge is limited. So, let us try with this model first. 
+I recommend starting training from a public checkpoint, like the one from the COCO dataset, with a 46.7% mAP. This dataset has previously been trained to detect objects from the real world, and even though it doesn't recognize mask placement, it will serve as a starting point for the Neural Network, which, despite its lack of knowledge of what a 'COVID-19 Mask' is, it's learned to detect other things, like **edges** of objects, shapes, and forms. This means that the model _knows_ about the real world, even if its knowledge is limited. So, let us try this model first. 
 
 ## Task 7: Conclusions
 
-After the training process is complete (can take up to 24 hours), we can see the following:
+After the training process is complete (which can take up to 24 hours), we can see the following:
 
 ![mAPs v1](./images/after_training.png)
 
@@ -231,10 +231,9 @@ In more detail, we get average precision broken down by validation and testing s
 
 > **Note**: since the validation set had fewer pictures than the test set, and the validation set has a lower precision, this leads me to believe that the lower precision on the validation set is caused by having too few pictures, and not by the model being inaccurate on detections. We will fix this in the next article, where we will make a more balanced split for our dataset.
 
-Also note that -- across validation and test set -- the "incorrect" label has a constant precision of **49%**. This makes sense, as it's the hardest class to predict of the three - it's very easy to see the difference between someone with our without a mask, but incorrectly-placed masks are harder to detect even for us. Thus, some pictures we may fail to be recognized as humans. As great, new professionals in Computer Vision, we will take note of this and we'll find a way to improve the precision in the future - taking special care with our under-performing class.
-
+Also, note that -- across validation and test set -- the "incorrect" label has a constant precision of **49%**. This makes sense, as it's the hardest class to predict of the three - it's very easy to see the difference between someone with our without a mask, but incorrectly-placed masks are harder to detect even for us. Thus, in some pictures, we may fail to be recognized as humans. As great, new professionals in Computer Vision, we will take note of this and we'll find a way to improve the precision in the future - taking special care with our underperforming class.
 
 ## Acknowledgements
 
 * **Author** - Nacho Martinez, Data Science Advocate @ Oracle DevRel
-* **Last Updated By/Date** - February 28th, 2023
+* **Last Updated By/Date** - March 6th, 2023