MicrosoftDocs
diff --git a/‎learn-pr/wwl-data-ai/introduction-computer-vision/1-introduction.yml
Lines changed: 16 additions & 0 deletions b/‎learn-pr/wwl-data-ai/introduction-computer-vision/1-introduction.yml
Lines changed: 16 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/introduction-computer-vision/2-overview.yml
Lines changed: 15 additions & 0 deletions b/‎learn-pr/wwl-data-ai/introduction-computer-vision/2-overview.yml
Lines changed: 15 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/introduction-computer-vision/3-understand-image-processing.yml
Lines changed: 16 additions & 0 deletions b/‎learn-pr/wwl-data-ai/introduction-computer-vision/3-understand-image-processing.yml
Lines changed: 16 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/introduction-computer-vision/4-computer-vision-models.yml
Lines changed: 16 additions & 0 deletions b/‎learn-pr/wwl-data-ai/introduction-computer-vision/4-computer-vision-models.yml
Lines changed: 16 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/introduction-computer-vision/5-modern-vision-models.yml
Lines changed: 15 additions & 0 deletions b/‎learn-pr/wwl-data-ai/introduction-computer-vision/5-modern-vision-models.yml
Lines changed: 15 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/introduction-computer-vision/6-knowledge-check.yml
Lines changed: 49 additions & 0 deletions b/‎learn-pr/wwl-data-ai/introduction-computer-vision/6-knowledge-check.yml
Lines changed: 49 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/introduction-computer-vision/7-summary.yml
Lines changed: 16 additions & 0 deletions b/‎learn-pr/wwl-data-ai/introduction-computer-vision/7-summary.yml
Lines changed: 16 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/introduction-computer-vision/includes/1-introduction.md
Lines changed: 11 additions & 0 deletions b/‎learn-pr/wwl-data-ai/introduction-computer-vision/includes/1-introduction.md
Lines changed: 11 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/introduction-computer-vision/includes/2-overview.md
Lines changed: 79 additions & 0 deletions b/‎learn-pr/wwl-data-ai/introduction-computer-vision/includes/2-overview.md
Lines changed: 79 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/introduction-computer-vision/includes/3-understand-image-processing.md
Lines changed: 59 additions & 0 deletions b/‎learn-pr/wwl-data-ai/introduction-computer-vision/includes/3-understand-image-processing.md
Lines changed: 59 additions & 0 deletions
@@ -0,0 +1,16 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.introduction-computer-vision.introduction
+title: Introduction
+metadata:
+  title: Introduction
+  description: Introduction
+  author: wwlpublish
+  ms.author: sheryang
+  ms.date: 5/20/2025
+  ms.topic: unit
+  ms.collection:
+    - wwl-ai-copilot
+durationInMinutes: 1
+content: |
+  [!include[](includes/1-introduction.md)]
+
@@ -0,0 +1,15 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.introduction-computer-vision.overview
+title: Overview
+metadata:
+  title: Overview
+  description: Overview
+  author: wwlpublish
+  ms.author: sheryang
+  ms.date: 5/20/2025
+  ms.topic: unit
+  ms.collection:
+    - wwl-ai-copilot
+durationInMinutes: 4
+content: |
+  [!include[](includes/2-overview.md)]
@@ -0,0 +1,16 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.introduction-computer-vision.understand-image-processing
+title: Understand image processing
+metadata:
+  title: Understand image processing
+  description: Understand how computers process images.
+  author: wwlpublish
+  ms.author: sheryang
+  ms.date: 5/20/2025
+  ms.topic: unit
+  ms.collection:
+    - wwl-ai-copilot
+durationInMinutes: 4
+content: |
+  [!include[](includes/3-understand-image-processing.md)]
+
@@ -0,0 +1,16 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.introduction-computer-vision.computer-vision-models
+title: Machine learning for computer vision
+metadata:
+  title: Machine learning for computer vision
+  description: Understand machine learning for computer vision
+  author: wwlpublish
+  ms.author: sheryang
+  ms.date: 5/20/2025
+  ms.topic: unit
+  ms.collection:
+    - wwl-ai-copilot
+durationInMinutes: 5
+content: |
+  [!include[](includes/4-computer-vision-models.md)]
+
@@ -0,0 +1,15 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.introduction-computer-vision.modern-vision-models
+title: Understand modern vision models
+metadata:
+  title: Understand modern vision models
+  description: Understand transformers and multimodal models 
+  author: wwlpublish
+  ms.author: sheryang
+  ms.date: 5/20/2025
+  ms.topic: unit
+  ms.collection:
+    - wwl-ai-copilot
+durationInMinutes: 5
+content: |
+  [!include[](includes/5-modern-vision-models.md)]
@@ -0,0 +1,49 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.introduction-computer-vision.knowledge-check
+title: Module assessment
+metadata:
+  title: Module assessment
+  description: Knowledge check
+  author: wwlpublish
+  ms.author: sheryang
+  ms.date: 5/20/2025
+  ms.topic: unit
+  ms.collection:
+    - wwl-ai-copilot
+durationInMinutes: 3
+quiz:
+  title: "Check your knowledge"
+  questions:
+  - content: "Computer vision is based on the manipulation and analysis of what kinds of values in an image?"
+    choices:
+    - content: "Timestamps in photograph metadata"
+      isCorrect: false
+      explanation: "Incorrect. Timestamps in the image metadata do not enable computer vision."
+    - content: "Pixels"
+      isCorrect: true
+      explanation: "Correct. Pixels are numeric values that represent shade intensity for points in the image."
+    - content: "Image file names"
+      isCorrect: false
+      explanation: "Incorrect. While file names might offer some clues as the image subject, they do not inherently enable computer vision."
+  - content: "What is the primary role of filters in a convolutional neural network (CNN) used for image classification?"
+    choices:
+    - content: "To apply visual effects to enhance image appearance."
+      isCorrect: false
+      explanation: "Incorrect."
+    - content: "To extract numeric features from images for use in a neural network."
+      isCorrect: true
+      explanation: "Correct."
+    - content: "To compress image size for faster processing."
+      isCorrect: false
+      explanation: "Incorrect."
+  - content: "What is the primary function of a multi-modal model in computer vision?"
+    choices:
+    - content: "To generate random captions for unlabeled images."
+      isCorrect: false
+      explanation: "Incorrect."
+    - content: "To replace CNNs entirely in all vision tasks."
+      isCorrect: false
+      explanation: "Incorrect."
+    - content: "To combine image features with natural language embeddings for richer understanding."
+      isCorrect: true
+      explanation: "Correct."    
@@ -0,0 +1,16 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.introduction-computer-vision.summary
+title: Summary
+metadata:
+  title: Summary
+  description: Summary
+  author: wwlpublish
+  ms.author: sheryang
+  ms.date: 5/20/2025
+  ms.topic: unit
+  ms.collection:
+    - wwl-ai-copilot
+durationInMinutes: 1
+content: |
+  [!include[](includes/7-summary.md)]
+
@@ -0,0 +1,11 @@
+**Computer vision** is one of the core areas of artificial intelligence (AI), and focuses on creating solutions that enable AI applications to "see" the world and make sense of it.
+
+Consider these scenarios: 
+
+- A hospital wants to detect and track surgical instruments in real-time during operations.
+- A retail company needs to classify products like shoes, shirts, and electronics, in images into categories. 
+- A wildlife preservation organization needs to identify the animals that walk through video footage.
+- A city's transportation department needs to read and extract text from images of license plates.
+- A manufacturing company wants to analyze visual patterns for defects.
+
+Of course, computers don't have biological eyes that work the way ours do, but they're capable of processing images; either from a live camera feed or from digital photographs or videos. This ability to process images is the key to creating software that can emulate human visual perception. In this module, we'll examine the building blocks that underlie modern computer vision solutions. 
@@ -0,0 +1,79 @@
+Computer vision capabilities can be categorized into a few main types: 
+
+|**Type**|**Description**|
+|-|-|
+|**Image analysis**| The ability to detect, classify, caption, and generate insights.|
+|**Spatial analysis**| The ability to understand people's presence and movements within physical areas in real time.|
+|**Facial recognition**|The ability to recognize and verify human identity.|
+|**Optical character recognition (OCR)**| The ability to extract printed and handwritten text from images with varied languages and writing styles.|
+
+To understand these computer vision capabilities, it's useful to consider what an image actually *is* in the context of data for a computer program.
+
+## Images as pixel arrays
+
+To a computer, an image is an array of numeric *pixel* values. For example, consider the following array:
+
+```
+ 0   0   0   0   0   0   0  
+ 0   0   0   0   0   0   0
+ 0   0  255 255 255  0   0
+ 0   0  255 255 255  0   0
+ 0   0  255 255 255  0   0
+ 0   0   0   0   0   0   0
+ 0   0   0   0   0   0   0
+```
+
+The array consists of seven rows and seven columns, representing the pixel values for a 7x7 pixel image (which is known as the image's *resolution*). Each pixel has a value between 0 (black) and 255 (white); with values between these bounds representing shades of gray. The image represented by this array looks similar to the following (magnified) image:
+
+![Diagram of a grayscale image.](../media/white-square.png)
+
+The array of pixel values for this image is two-dimensional (representing rows and columns, or *x* and *y* coordinates) and defines a single rectangle of pixel values. A single layer of pixel values like this represents a grayscale image. In reality, most digital images are multidimensional and consist of three layers (known as *channels*) that represent red, green, and blue (RGB) color hues. For example, we could represent a color image by defining three channels of pixel values that create the same square shape as the previous grayscale example:
+
+```
+Red:
+ 150  150  150  150  150  150  150  
+ 150  150  150  150  150  150  150
+ 150  150  255  255  255  150  150
+ 150  150  255  255  255  150  150
+ 150  150  255  255  255  150  150
+ 150  150  150  150  150  150  150
+ 150  150  150  150  150  150  150
+
+Green:
+ 0    0    0    0    0    0    0          
+ 0    0    0    0    0    0    0
+ 0    0   255  255  255   0    0
+ 0    0   255  255  255   0    0
+ 0    0   255  255  255   0    0
+ 0    0    0    0    0    0    0
+ 0    0    0    0    0    0    0
+
+Blue:
+ 255  255  255  255  255  255  255  
+ 255  255  255  255  255  255  255
+ 255  255   0    0    0   255  255
+ 255  255   0    0    0   255  255
+ 255  255   0    0    0   255  255
+ 255  255  255  255  255  255  255
+ 255  255  255  255  255  255  255
+```
+
+Here's the resulting image:
+
+![Diagram of a color image.](../media/color-square.png)
+
+The purple squares are represented by the combination: 
+```
+Red: 150 
+Green: 0 
+Blue: 255 
+```
+
+The yellow squares in the center are represented by the combination: 
+```
+Red: 255
+Green: 255
+Blue: 0
+```
+
+Next, let's explore how images are processed.
@@ -0,0 +1,59 @@
+A common way to perform image processing tasks is to apply *filters* that modify the pixel values of the image to create a visual effect. A filter is defined by one or more arrays of pixel values, called filter *kernels*. For example, you could define filter with a 3x3 kernel as shown in this example:
+
+```
+-1 -1 -1
+-1  8 -1
+-1 -1 -1
+```
+
+The kernel is then *convolved* across the image, calculating a weighted sum for each 3x3 patch of pixels and assigning the result to a new image. It's easier to understand how the filtering works by exploring a step-by-step example.
+
+Let's start with the grayscale image we explored previously:
+
+```
+ 0   0   0   0   0   0   0  
+ 0   0   0   0   0   0   0
+ 0   0  255 255 255  0   0
+ 0   0  255 255 255  0   0
+ 0   0  255 255 255  0   0
+ 0   0   0   0   0   0   0
+ 0   0   0   0   0   0   0
+```
+
+First, we apply the filter kernel to the top left patch of the image, multiplying each pixel value by the corresponding weight value in the kernel and adding the results:
+
+```
+(0 x -1) + (0 x -1) + (0 x -1) +
+(0 x -1) + (0 x 8) + (0 x -1) +
+(0 x -1) + (0 x -1) + (255 x -1) = -255
+```
+
+The result (-255) becomes the first value in a new array. Then we move the filter kernel along one pixel to the right and repeat the operation:
+
+```
+(0 x -1) + (0 x -1) + (0 x -1) +
+(0 x -1) + (0 x 8) + (0 x -1) +
+(0 x -1) + (255 x -1) + (255 x -1) = -510
+```
+
+Again, the result is added to the new array, which now contains two values:
+
+```
+-255  -510
+```
+
+The process is repeated until the filter has been convolved across the entire image, as shown in this animation:
+
+![Diagram of a filter.](../media/filter.gif)
+
+The filter is convolved across the image, calculating a new array of values. Some of the values might be outside of the 0 to 255 pixel value range, so the values are adjusted to fit into that range. Because of the shape of the filter, the outside edge of pixels isn't calculated, so a padding value (usually 0) is applied. The resulting array represents a new image in which the filter has transformed the original image. In this case, the filter has had the effect of highlighting the *edges* of shapes in the image.
+
+To see the effect of the filter more clearly, here's an example of the same filter applied to a real image:
+
+| Original Image | Filtered Image |
+|--|--|
+|![Diagram of a banana.](../media/banana-grayscale.png)| ![Diagram of a filtered banana.](../media/laplace.png)|
+
+Because the filter is convolved across the image, this kind of image manipulation is often referred to as *convolutional filtering*. The filter used in this example is a particular type of filter (called a *laplace* filter) that highlights the edges on objects in an image. There are many other kinds of filter that you can use to create blurring, sharpening, color inversion, and other effects.
+
+Next, let's connect concepts of convolutional filtering to modern vision models.