mlflow input and output schema update

vadthyavath · vadthyavath · commit 1f8646527191 · 2022-09-26T15:59:31.000+05:30
diff --git a/articles/machine-learning/reference-automl-images-schema.md b/articles/machine-learning/reference-automl-images-schema.md
@@ -235,170 +235,229 @@ Example of a JSONL file for Instance Segmentation:
 
 ![Image example for instance segmentation.](media/reference-automl-images-schema/instance-segmentation-predictions.jpg)
 
-## Data format for inference
+## Data schema for online scoring
 
-In this section, we document the input data format required to make predictions when using a deployed model. Any aforementioned image format is accepted with content type `application/octet-stream`.
+In this section, we document the input data format required to make predictions using a deployed model.
 
 ### Input format
 
-The following is the input format needed to generate predictions on any task using task-specific model endpoint. After we [deploy the model](how-to-auto-train-image-models.md#register-and-deploy-model), we can use the following code snippet to get predictions for all tasks.
+The following is the input format needed to generate predictions on any task using task-specific model endpoint.
+
+```json
+{
+   "input_data": {
+      "columns": [
+         "image"
+      ],
+      "data": [
+         "image_in_base64_string_format"
+      ]
+   }
+}
+```
+
+This json is a dictionary with outer key `input_data` and inner keys `columns`, `data` as described in the following table. The endpoint accepts json string as input in the above format so that this json string can be decoded to json to create a dataframe of samples required by scoring script. Each input image in the `request_json["input_data"]["data"]` defined in the json format above is a [base64 encoded string](https://docs.python.org/3/library/base64.html#base64.encodebytes).
+
+
+| Key       | Description  |
+| -------- |----------|
+| `input_data`<br> (outer key) | It is an outer key in json request. `input_data` is a dictionary that accepts input image samples <br>`Required, Dictionary` |
+| `columns`<br> (inner key) | Column names to be used to create dataframe in scoring script. It accepts only one column with `image` as column name.<br>`Required, List` |
+| `data`<br> (inner key) | List of base64 encoded images <br>`Required, List`|
+
+
+After we [deploy the mlflow model](how-to-auto-train-image-models.md#register-and-deploy-model), we can use the following code snippet to get predictions for all tasks.
+
 
 ```python
-# input image for inference
-sample_image = './test_image.jpg'
-# load image data
-data = open(sample_image, 'rb').read()
-# set the content type
-headers = {'Content-Type': 'application/octet-stream'}
-# if authentication is enabled, set the authorization header
-headers['Authorization'] = f'Bearer {key}'
-# make the request and display the response
-response = requests.post(scoring_uri, data, headers=headers)
+# Get the details for online endpoint
+endpoint = ml_client.online_endpoints.get(name=online_endpoint_name)
+
+# Create request json
+import base64
+import json
+
+sample_image = "./test_image.jpg"
+
+def read_image(image_path):
+    with open(image_path, "rb") as f:
+        return f.read()
+
+request_json = {
+    "input_data": {
+        "columns": ["image"],
+        "data": [base64.encodebytes(read_image(sample_image)).decode("utf-8")],
+    }
+}
+
+request_file_name = "sample_request_data.json"
+
+with open(request_file_name, "w") as request_file:
+    json.dump(request_json, request_file)
+
+resp = ml_client.online_endpoints.invoke(
+    endpoint_name=online_endpoint_name,
+    deployment_name=deployment.name,
+    request_file=request_file_name,
+)
+predictions = json.loads(resp)
 ```
+
+
 ### Output format
 
-Predictions made on model endpoints follow different structure depending on the task type. This section explores the output data formats for multi-class, multi-label image classification, object detection, and instance segmentation tasks.  
+Predictions made on model endpoints follow different structure depending on the task type. This section explores the output data formats for multi-class, multi-label image classification, object detection, and instance segmentation tasks.
+
+The following schemas are defined for the case of one input image.
 
 #### Image classification
 
 Endpoint for image classification returns all the labels in the dataset and their probability scores for the input image in the following format.
 
 ```json
-{
-   "filename":"/tmp/tmppjr4et28",
-   "probs":[
-      2.098e-06,
-      4.783e-08,
-      0.999,
-      8.637e-06
-   ],
-   "labels":[
-      "can",
-      "carton",
-      "milk_bottle",
-      "water_bottle"
-   ]
-}
+[
+   {
+      "filename": "/tmp/tmppjr4et28",
+      "probs": [
+         2.098e-06,
+         4.783e-08,
+         0.999,
+         8.637e-06
+      ],
+      "labels": [
+         "can",
+         "carton",
+         "milk_bottle",
+         "water_bottle"
+      ]
+   }
+]
 ```
 
 #### Image classification multi-label
 
 For image classification multi-label, model endpoint returns labels and their probabilities.
 
 ```json
-{
-   "filename":"/tmp/tmpsdzxlmlm",
-   "probs":[
-      0.997,
-      0.960,
-      0.982,
-      0.025
-   ],
-   "labels":[
-      "can",
-      "carton",
-      "milk_bottle",
-      "water_bottle"
-   ]
-}
+[
+   {
+      "filename": "/tmp/tmpsdzxlmlm",
+      "probs": [
+         0.997,
+         0.960,
+         0.982,
+         0.025
+      ],
+      "labels": [
+         "can",
+         "carton",
+         "milk_bottle",
+         "water_bottle"
+      ]
+   }
+]
 ```
 
 #### Object detection
 
 Object detection model returns multiple boxes with their scaled top-left and bottom-right coordinates along with box label and confidence score.
 
 ```json
-{
-   "filename":"/tmp/tmpdkg2wkdy",
-   "boxes":[
-      {
-         "box":{
-            "topX":0.224,
-            "topY":0.285,
-            "bottomX":0.399,
-            "bottomY":0.620
+[
+   {
+      "filename": "/tmp/tmpdkg2wkdy",
+      "boxes": [
+         {
+            "box": {
+               "topX": 0.224,
+               "topY": 0.285,
+               "bottomX": 0.399,
+               "bottomY": 0.620
+            },
+            "label": "milk_bottle",
+            "score": 0.937
          },
-         "label":"milk_bottle",
-         "score":0.937
-      },
-      {
-         "box":{
-            "topX":0.664,
-            "topY":0.484,
-            "bottomX":0.959,
-            "bottomY":0.812
+         {
+            "box": {
+               "topX": 0.664,
+               "topY": 0.484,
+               "bottomX": 0.959,
+               "bottomY": 0.812
+            },
+            "label": "can",
+            "score": 0.891
          },
-         "label":"can",
-         "score":0.891
-      },
-      {
-         "box":{
-            "topX":0.423,
-            "topY":0.253,
-            "bottomX":0.632,
-            "bottomY":0.725
-         },
-         "label":"water_bottle",
-         "score":0.876
-      }
-   ]
-}
+         {
+            "box": {
+               "topX": 0.423,
+               "topY": 0.253,
+               "bottomX": 0.632,
+               "bottomY": 0.725
+            },
+            "label": "water_bottle",
+            "score": 0.876
+         }
+      ]
+   }
+]
 ```
 #### Instance segmentation
 
 In instance segmentation, output consists of multiple boxes with their scaled top-left and bottom-right coordinates, labels, confidence scores, and polygons (not masks). Here, the polygon values are in the same format that we discussed in the schema section.
 
 ```json
-{
-   "filename":"/tmp/tmpi8604s0h",
-   "boxes":[
-      {
-         "box":{
-            "topX":0.679,
-            "topY":0.491,
-            "bottomX":0.926,
-            "bottomY":0.810
-         },
-         "label":"can",
-         "score":0.992,
-         "polygon":[
-            [
-               0.82, 0.811, 0.771, 0.810, 0.758, 0.805, 0.741, 0.797, 0.735, 0.791, 0.718, 0.785, 0.715, 0.778, 0.706, 0.775, 0.696, 0.758, 0.695, 0.717, 0.698, 0.567, 0.705, 0.552, 0.706, 0.540, 0.725, 0.520, 0.735, 0.505, 0.745, 0.502, 0.755, 0.493
-            ]
-         ]
-      },
-      {
-         "box":{
-            "topX":0.220,
-            "topY":0.298,
-            "bottomX":0.397,
-            "bottomY":0.601
-         },
-         "label":"milk_bottle",
-         "score":0.989,
-         "polygon":[
-            [
-               0.365, 0.602, 0.273, 0.602, 0.26, 0.595, 0.263, 0.588, 0.251, 0.546, 0.248, 0.501, 0.25, 0.485, 0.246, 0.478, 0.245, 0.463, 0.233, 0.442, 0.231, 0.43, 0.226, 0.423, 0.226, 0.408, 0.234, 0.385, 0.241, 0.371, 0.238, 0.345, 0.234, 0.335, 0.233, 0.325, 0.24, 0.305, 0.586, 0.38, 0.592, 0.375, 0.598, 0.365
-            ]
-         ]
-      },
-      {
-         "box":{
-            "topX":0.433,
-            "topY":0.280,
-            "bottomX":0.621,
-            "bottomY":0.679
-         },
-         "label":"water_bottle",
-         "score":0.988,
-         "polygon":[
-            [
-               0.576, 0.680, 0.501, 0.680, 0.475, 0.675, 0.460, 0.625, 0.445, 0.630, 0.443, 0.572, 0.440, 0.560, 0.435, 0.515, 0.431, 0.501, 0.431, 0.433, 0.433, 0.426, 0.445, 0.417, 0.456, 0.407, 0.465, 0.381, 0.468, 0.327, 0.471, 0.318
-            ]
-         ]
-      }
-   ]
-}
+[
+    {
+       "filename": "/tmp/tmpi8604s0h",
+       "boxes": [
+          {
+             "box": {
+                "topX": 0.679,
+                "topY": 0.491,
+                "bottomX": 0.926,
+                "bottomY": 0.810
+             },
+             "label": "can",
+             "score": 0.992,
+             "polygon": [
+                [
+                   0.82, 0.811, 0.771, 0.810, 0.758, 0.805, 0.741, 0.797, 0.735, 0.791, 0.718, 0.785, 0.715, 0.778, 0.706, 0.775, 0.696, 0.758, 0.695, 0.717, 0.698, 0.567, 0.705, 0.552, 0.706, 0.540, 0.725, 0.520, 0.735, 0.505, 0.745, 0.502, 0.755, 0.493
+                ]
+             ]
+          },
+          {
+             "box": {
+                "topX": 0.220,
+                "topY": 0.298,
+                "bottomX": 0.397,
+                "bottomY": 0.601
+             },
+             "label": "milk_bottle",
+             "score": 0.989,
+             "polygon": [
+                [
+                   0.365, 0.602, 0.273, 0.602, 0.26, 0.595, 0.263, 0.588, 0.251, 0.546, 0.248, 0.501, 0.25, 0.485, 0.246, 0.478, 0.245, 0.463, 0.233, 0.442, 0.231, 0.43, 0.226, 0.423, 0.226, 0.408, 0.234, 0.385, 0.241, 0.371, 0.238, 0.345, 0.234, 0.335, 0.233, 0.325, 0.24, 0.305, 0.586, 0.38, 0.592, 0.375, 0.598, 0.365
+                ]
+             ]
+          },
+          {
+             "box": {
+                "topX": 0.433,
+                "topY": 0.280,
+                "bottomX": 0.621,
+                "bottomY": 0.679
+             },
+             "label": "water_bottle",
+             "score": 0.988,
+             "polygon": [
+                [
+                   0.576, 0.680, 0.501, 0.680, 0.475, 0.675, 0.460, 0.625, 0.445, 0.630, 0.443, 0.572, 0.440, 0.560, 0.435, 0.515, 0.431, 0.501, 0.431, 0.433, 0.433, 0.426, 0.445, 0.417, 0.456, 0.407, 0.465, 0.381, 0.468, 0.327, 0.471, 0.318
+                ]
+             ]
+          }
+       ]
+    }
+]
 ```
 
 > [!NOTE]