|
7 | 7 | "source": [ |
8 | 8 | "# ML cube Platform SDK - Task and Model creation\n", |
9 | 9 | "\n", |
10 | | - "In this notebook, you will see how to create a Task and add models to it in order to start to monitor them.\n", |
| 10 | + "In this notebook, you will see how to create a Task and how to add a model to start monitoring its performance.\n", |
11 | 11 | "\n", |
12 | 12 | "**Requirements**:\n", |
13 | 13 | "\n", |
14 | 14 | "1. API Key of a User with roles `COMPANY_ADMIN` or `PROJECT_ADMIN`\n", |
15 | | - "2. Id of the project\n", |
| 15 | + "2. ID of the project\n", |
16 | 16 | "\n", |
17 | 17 | "**User Input**\n", |
18 | 18 | "\n", |
19 | | - "In the notebook you will need to complete variables and names to correctly run it.\n", |
20 | | - "Whenever you see the comment `# TO COMPLETE` you need to fill the empty string." |
| 19 | + "You will need to provide some values for variables and names to ensure the notebook runs correctly. Whenever you see the comment `# TO COMPLETE`, make sure to fill the empty string accordingly." |
21 | 20 | ] |
22 | 21 | }, |
23 | 22 | { |
|
28 | 27 | "**Imports**" |
29 | 28 | ] |
30 | 29 | }, |
| 30 | + { |
| 31 | + "metadata": {}, |
| 32 | + "cell_type": "code", |
| 33 | + "outputs": [], |
| 34 | + "execution_count": null, |
| 35 | + "source": [ |
| 36 | + "import logging\n", |
| 37 | + "logger = logging.getLogger(\"platform_tutorial\")" |
| 38 | + ], |
| 39 | + "id": "ac1dbec9a9146410" |
| 40 | + }, |
| 41 | + { |
| 42 | + "metadata": {}, |
| 43 | + "cell_type": "code", |
| 44 | + "outputs": [], |
| 45 | + "execution_count": null, |
| 46 | + "source": [ |
| 47 | + "from ml3_platform_sdk.client import ML3PlatformClient\n", |
| 48 | + "from ml3_platform_sdk import enums as ml3_enums\n", |
| 49 | + "from ml3_platform_sdk import models as ml3_models" |
| 50 | + ], |
| 51 | + "id": "c8bf440019a4ca24" |
| 52 | + }, |
31 | 53 | { |
32 | 54 | "cell_type": "markdown", |
33 | 55 | "id": "26f67a46-cacf-4ca8-b67e-43388566b3c7", |
|
54 | 76 | "id": "c6588f7a-488c-4d3b-bc57-ec5ddbb0f955", |
55 | 77 | "metadata": {}, |
56 | 78 | "source": [ |
57 | | - "If you don't remember the id of the projcet you can get the list of projects:\n", |
| 79 | + "If you don't remember the id of the project you can get the list of your projects:\n", |
58 | 80 | "```py\n", |
59 | 81 | "projects: List[Project] = client.get_projects()\n", |
60 | 82 | "logger.info(f'Projects inside the company are: {projects}')\n", |
61 | 83 | "```" |
62 | 84 | ] |
63 | 85 | }, |
64 | | - { |
65 | | - "cell_type": "code", |
66 | | - "execution_count": null, |
67 | | - "id": "cd94e6fe", |
68 | | - "metadata": {}, |
69 | | - "outputs": [], |
70 | | - "source": [ |
71 | | - "import logging\n", |
72 | | - "logger = logging.getLogger(\"platform_tutorial\")" |
73 | | - ] |
74 | | - }, |
75 | | - { |
76 | | - "cell_type": "code", |
77 | | - "execution_count": null, |
78 | | - "id": "b808c779", |
79 | | - "metadata": {}, |
80 | | - "outputs": [], |
81 | | - "source": [ |
82 | | - "from ml3_platform_sdk.client import ML3PlatformClient\n", |
83 | | - "from ml3_platform_sdk import enums as ml3_enums\n", |
84 | | - "from ml3_platform_sdk import models as ml3_models" |
85 | | - ] |
86 | | - }, |
87 | 86 | { |
88 | 87 | "cell_type": "markdown", |
89 | 88 | "id": "0c085046-a664-4085-b011-e4ff37d27ca5", |
90 | 89 | "metadata": {}, |
91 | 90 | "source": [ |
92 | | - "**Instantiace the Client**\n", |
| 91 | + "**Instantiate the Client**\n", |
93 | 92 | "\n", |
94 | | - "To interact with ML cube Platform you need to instantiate the client only the first time.\n", |
95 | | - "Then you will use its methods to perform requests.\n", |
| 93 | + "To interact with ML cube Platform, you need to instantiate the client. You will then use its methods to perform requests.\n", |
96 | 94 | "Please, insert the api key we provided you to instantiate the client." |
97 | 95 | ] |
98 | 96 | }, |
|
115 | 113 | "source": [ |
116 | 114 | "**Create Task**\n", |
117 | 115 | "\n", |
118 | | - "To monitor your models you need to add them in a `Task`, it represents a AI problem like regression or classification over a dataset." |
| 116 | + "To monitor your models you need to add them in a `Task`. \n", |
| 117 | + "\n", |
| 118 | + "A `Task` represents an AI problem, such as regression or binary classification, over a dataset." |
119 | 119 | ] |
120 | 120 | }, |
121 | 121 | { |
|
149 | 149 | "source": [ |
150 | 150 | "**Data schema**\n", |
151 | 151 | "\n", |
152 | | - "The data schema describes the data used in this task by your models.\n", |
153 | | - "It contains features, targets and a set of mandatory metadata required by ML cube Platform for a correct function.\n", |
154 | | - "Each sample is required to have associated a `timestamp` and an `identifier`: the timestamp is used to sort your data and the identifier to share information about data without transferring them.\n", |
| 152 | + "The data schema defines the structure of the data used in this task.\n", |
| 153 | + "It contains features, target and a set of mandatory metadata required by the ML cube Platform.\n", |
| 154 | + "Each sample must have a `timestamp` and an `identifier`: the timestamp is used to sort your data chronologically and the identifier is used to share information about data without transferring them.\n", |
155 | 155 | "\n", |
156 | | - "The data schema is specified by the class `DataSchema` that you find in the `models` package of our sdk.\n", |
157 | | - "In the following cell you can see an example of a DataSchema, as you can notice, the model's predictions are not mentioned. That's why they will be automatically added when you create a model for the task." |
| 156 | + "The data schema is specified by the class `DataSchema` which can be found in the `models` package of our sdk.\n", |
| 157 | + "In the following cell you can see an example of a DataSchema. As you can notice, the model predictions are not included. The reason is that they will be automatically added when a model is created within the task." |
158 | 158 | ] |
159 | 159 | }, |
160 | 160 | { |
|
209 | 209 | "source": [ |
210 | 210 | "**Historical data**\n", |
211 | 211 | "\n", |
212 | | - "Ok, now that you inserted the data schema for your Task you are able to upload data.\n", |
| 212 | + "Now that you have inserted the data schema for your Task you are able to upload data.\n", |
213 | 213 | "There are two classes of data: *historical* and *production*.\n", |
214 | | - "Historical data represents data you had before the model was in production while, production data are data that comes from the production environment.\n", |
215 | | - "Model reference data are selected from historical one by specifying the time range.\n", |
| 214 | + "Historical data represent data you had before the model was in production, while production data are those coming from the production environment.\n", |
| 215 | + "The reference data for the model are selected from the historical data by specifying a time range.\n", |
216 | 216 | "\n", |
217 | 217 | "This is the first time you send data to ML cube Platform, therefore, we have some things to explain:\n", |
218 | 218 | "\n", |
219 | 219 | "- data are composed of features, targets, predictions. You send each category separately since data can come from multiple sources;\n", |
220 | | - "- the operations of sending data belong to the category of operations that runs a pipeline inside ML cube Platform. In this case the pipeline is composed only by the data step that reads the data, validate them and then if the storing policy is MLCUBE it stores inside the ML cube Platform's Secure Storage;\n", |
221 | | - "- the pipeline is identified by a `job_id` and you can follow the execution status by asking to the client its information.\n", |
| 220 | + "- the operation of sending data belong to the category of operations that runs a pipeline inside the ML cube Platform. In this case the pipeline is composed only by the data step that reads the data, validate them and then if the storing policy is MLCUBE it stores inside the ML cube Platform's Secure Storage;\n", |
| 221 | + "- the pipeline is identified by a `job_id` and you can follow the execution status by asking the client its information. Additionally, you can wait for the completion of the job by calling the method `wait_job_completion(job_id)`.\n", |
222 | 222 | "\n", |
223 | | - "In the cell below, we sends features using `LocalDataSource` beceause we have the file locally, and we use a `GCSDataSource` for the target because we have it on the cloud.\n", |
224 | | - "In order to use remote data sources you need to add credentials on ML cube Platform and then you specify them in the `DataSource` object." |
| 223 | + "In the cell below, we send features using a `LocalDataSource`, since we have the file locally, and we use a `GCSDataSource` for the target because we have it on the cloud.\n", |
| 224 | + "In order to use remote data sources you need to add credentials on the ML cube Platform. You can specify them in the `DataSource` object." |
225 | 225 | ] |
226 | 226 | }, |
227 | 227 | { |
|
233 | 233 | "source": [ |
234 | 234 | "# TO COMPLETE\n", |
235 | 235 | "inputs_data_source = ml3_models.LocalDataSource(\n", |
236 | | - " data_structure=ml3_enums.DataStructure.TABULAR,\n", |
237 | 236 | " file_path=\"path/to/file.csv\",\n", |
238 | 237 | " file_type=ml3_enums.FileType.CSV,\n", |
239 | 238 | " is_folder=False,\n", |
240 | 239 | " folder_type=None\n", |
241 | 240 | ")\n", |
242 | 241 | "target_data_source = ml3_models.GCSDataSource(\n", |
243 | | - " dataset_type=ml3_enums.DatasetType.TABULAR,\n", |
244 | 242 | " object_path=\"gs://path/to/file.csv\",\n", |
245 | 243 | " credentials_id='gcp_credentials_id',\n", |
246 | 244 | " file_type=ml3_enums.FileType.CSV,\n", |
|
271 | 269 | "**Create Model**\n", |
272 | 270 | "\n", |
273 | 271 | "After the task is created, you can add AI models inside it.\n", |
274 | | - "A model is univoquely identified by the pair `name` and `version`.\n", |
275 | | - "The version identifies a specific trained instance of the model, whenever, you retrain your model, you will update its version on ML cube Platform.\n", |
276 | | - "The field `metric_name` represents the error or performance metric used inside ML cube Platform to show to you the statistics of the model or in the retraining report." |
| 272 | + "A model is uniquely identified by the pair `name` and `version`.\n", |
| 273 | + "The version identifies a specific trained instance of the model. Whenever you retrain your model, you will update its version on the ML cube Platform.\n", |
| 274 | + "The metric_name field indicates the error or performance metric used in the ML Cube Platform to display the model's performance and it is also included in the retraining report." |
277 | 275 | ] |
278 | 276 | }, |
279 | 277 | { |
|
287 | 285 | "model_id = ml3_client.create_model(\n", |
288 | 286 | " task_id=task_id,\n", |
289 | 287 | " name=\"model-name\",\n", |
290 | | - " version=\"v0.2.1\",\n", |
| 288 | + " version=\"v0.0.1\",\n", |
291 | 289 | " metric_name=ml3_enums.ModelMetricName.RMSE,\n", |
292 | 290 | " preferred_suggestion_type=ml3_enums.SuggestionType.SAMPLE_WEIGHTS,\n", |
293 | 291 | " with_probabilistic_output=False,\n", |
|
302 | 300 | "source": [ |
303 | 301 | "**Model reference**\n", |
304 | 302 | "\n", |
305 | | - "In the previous cell you created the model but it is not complete because it misses the training dataset that in ML cube Platform is called *reference*.\n", |
306 | | - "Here you add the reference data of the model by specifying the time range, ML cube Platform automatically select from all the previously uploaded data the reference data." |
| 303 | + "In the previous cell you created the model, but it still misses the training dataset. In the ML cube Platform, the training dataset is called *reference*.\n", |
| 304 | + "You can add the reference data of the model by specifying the time range. The ML cube Platform will automatically select from the previously uploaded data the reference data specified." |
307 | 305 | ] |
308 | 306 | }, |
309 | 307 | { |
|
336 | 334 | "source": [ |
337 | 335 | "**Congratulations!**\n", |
338 | 336 | "\n", |
339 | | - "In this notebook, you learned how to create a task, add a model to this task and uploading to ML cube Platform both historical and reference data." |
| 337 | + "In this notebook, you learned how to create a task, add a model to the task, and upload both historical and reference data to the ML Cube Platform." |
340 | 338 | ] |
341 | 339 | } |
342 | 340 | ], |
|
0 commit comments