Skip to content

Commit 202a443

Browse files
Merge pull request #4 from ml-cube/dev-update-notebooks
Notebook refactor and update
2 parents dbd74f3 + ca38667 commit 202a443

File tree

6 files changed

+294
-250
lines changed

6 files changed

+294
-250
lines changed

notebooks/0_company_project.ipynb

Lines changed: 27 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -7,18 +7,18 @@
77
"source": [
88
"# ML cube Platform SDK - First Setup\n",
99
"\n",
10-
"In this notebook, you will see how to setup your account on ML cube Platform.\n",
11-
"The first thing to do when your account is created is to create your `Company`, then, usually, you create new `User` for your team or for *service accounts* that will be integrated in your MLOps pipeline.\n",
12-
"After that, you can create a `Project` and assign specific roles to other team member in order to define the right level of security.\n",
10+
"In this notebook, we'll guide you through setting up your account on the ML Cube Platform.\n",
11+
"Once your account is created, the first step is to set up your `Company`. Next, you can add new `Users`, either for your team members or for service accounts that will integrate into your MLOps pipeline.\n",
12+
"\n",
13+
"Afterward, you can create a `Project` and assign specific roles to your team members in order to define the appropriate level of security and access.\n",
1314
"\n",
1415
"**Requirements**:\n",
1516
"\n",
16-
"1. Valid API Key provided by ML cube\n",
17+
"A valid API Key provided by ML cube\n",
1718
"\n",
1819
"**User Input**\n",
1920
"\n",
20-
"In the notebook you will need to complete variables and names to correctly run it.\n",
21-
"Whenever you see the comment `# TO COMPLETE` you need to fill the empty string."
21+
"You will need to provide some values for variables and names to ensure the notebook runs correctly. Whenever you see the comment `# TO COMPLETE`, make sure to fill the empty string accordingly.\n"
2222
]
2323
},
2424
{
@@ -73,9 +73,10 @@
7373
"source": [
7474
"**Create Company**\n",
7575
"\n",
76-
"The first user inside ML cube Platform you do not belong to any Company. \n",
77-
"Thus, the first operation to do is creating it.\n",
78-
"The user that creates the company becomes automatically the *owner* and has administration permissions, he can create new users and projects."
76+
"The first user inside ML cube Platform does not belong to any Company. \n",
77+
"Therefore, the first operation is to create the company.\n",
78+
"\n",
79+
"The user who creates the company becomes automatically the *owner* and has administration permissions. This allows them to manage users, create projects, and set permissions."
7980
]
8081
},
8182
{
@@ -100,9 +101,10 @@
100101
"id": "517e7298-fd82-4212-92f0-928a849911db",
101102
"metadata": {},
102103
"source": [
103-
"As you can see the method `create_company` returned the id of the entity you created.\n",
104-
"ML cube Platform clients does this for every entity you create so that, you always know what you created and you can use this identifier for future interactions with ML cube Platform.\n",
105-
"Differently from other entities inside ML cube Platform, the company is unique and it cannot be changed for the User, therefore, any operations you perform at company level does not need its identifier because we retrieve it directly from you user information."
104+
"As you can see, the method `create_company` returns the id of the company just created.\n",
105+
"The ML cube Platform Client follows this approach for every entity, ensuring that you always have the identifier of what you've created. The ID can then be used for any future interaction with the ML cube Platform.\n",
106+
"\n",
107+
"Differently from other entities inside ML cube Platform, the company is unique and it cannot be changed for the User. Therefore, any operation you perform at company level does not need its identifier, as we are able to retrieve it directly from the user information."
106108
]
107109
},
108110
{
@@ -112,10 +114,10 @@
112114
"source": [
113115
"**Create User**\n",
114116
"\n",
115-
"As owner of the company, you can create new users for you team. Each user has a `CompanyRole` that specifies its permissions.\n",
116-
"The user can have associated a role also for each project inside the Company, in the following notebook cells you will se how to do it.\n",
117+
"As the owner of the company, you can create new users for you team. Each user has a `CompanyRole` that specifies its permissions.\n",
118+
"The user can also have a specific role for each project inside the Company. In the following cells you will see how to do it.\n",
117119
"\n",
118-
"Please, complete the fields with the right information and choose the right user role."
120+
"Please, complete the fields with the right information and choose the appropriate user role."
119121
]
120122
},
121123
{
@@ -144,8 +146,9 @@
144146
"source": [
145147
"**User API Key**\n",
146148
"\n",
147-
"After a User is created he can login to the web application and start working on it.\n",
148-
"In order to use the SDK he needs to create his own api key on the web app, since you are the owner you can create an api key for him directly here.\n",
149+
"After a User is created, he is able to log in the web application and start working on it.\n",
150+
"\n",
151+
"In order to use the SDK, he needs to create his own api key on the web app. Since you are the owner, you can create an api key for him directly here.\n",
149152
"This is particularly useful when you need to create service accounts that are used in your pipelines."
150153
]
151154
},
@@ -191,9 +194,11 @@
191194
"source": [
192195
"**Create Project**\n",
193196
"\n",
194-
"For now, you created a company, a new user for you colleague with his api key. It's time to create a project and start working for your models.\n",
197+
"So far, you created a company and a new user for you colleague with his api key. \n",
198+
"\n",
199+
"It's time to create a project and start working on your models.\n",
195200
"A project needs a name, a description and the default storage policy.\n",
196-
"You can read more details about what is a StoragePolicy on ML cube Platform documentation, in few words, it specifies if data can be stored inside ML cube Secure Storage or not."
201+
"You can read more details about what is a StoragePolicy on the ML cube Platform documentation. In a few words, it specifies whether data can be stored inside ML cube Secure Storage or not."
197202
]
198203
},
199204
{
@@ -219,8 +224,8 @@
219224
"source": [
220225
"**User Project Role**\n",
221226
"\n",
222-
"Since, the user we created has role `COMPANY_USER`, he does not have permissions on this project.\n",
223-
"Therefore, you need to assign a role to him."
227+
"Since the user we created before has role `COMPANY_USER`, he does not have permissions on this project.\n",
228+
"Therefore, you need to explicitly assign a role to him."
224229
]
225230
},
226231
{
@@ -244,7 +249,7 @@
244249
"source": [
245250
"**Congratulations!**\n",
246251
"\n",
247-
"In this notebook, you learned how to create your company, then you created a new user and a project. The created user has not admin role, thus, you assigned a project role to allow him to work on the project.o create your company, then you created a new user and a project. The created user has not admin role, thus, you assigned a project role to allow him to work on the project."
252+
"In this notebook, you learned how to create your company, a new user and a project. Since the created user does not have an admin role, you assigned him a project role to allow him to work on the project."
248253
]
249254
}
250255
],

notebooks/1_task_and_model.ipynb

Lines changed: 51 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -7,17 +7,16 @@
77
"source": [
88
"# ML cube Platform SDK - Task and Model creation\n",
99
"\n",
10-
"In this notebook, you will see how to create a Task and add models to it in order to start to monitor them.\n",
10+
"In this notebook, you will see how to create a Task and how to add a model to start monitoring its performance.\n",
1111
"\n",
1212
"**Requirements**:\n",
1313
"\n",
1414
"1. API Key of a User with roles `COMPANY_ADMIN` or `PROJECT_ADMIN`\n",
15-
"2. Id of the project\n",
15+
"2. ID of the project\n",
1616
"\n",
1717
"**User Input**\n",
1818
"\n",
19-
"In the notebook you will need to complete variables and names to correctly run it.\n",
20-
"Whenever you see the comment `# TO COMPLETE` you need to fill the empty string."
19+
"You will need to provide some values for variables and names to ensure the notebook runs correctly. Whenever you see the comment `# TO COMPLETE`, make sure to fill the empty string accordingly."
2120
]
2221
},
2322
{
@@ -28,6 +27,29 @@
2827
"**Imports**"
2928
]
3029
},
30+
{
31+
"metadata": {},
32+
"cell_type": "code",
33+
"outputs": [],
34+
"execution_count": null,
35+
"source": [
36+
"import logging\n",
37+
"logger = logging.getLogger(\"platform_tutorial\")"
38+
],
39+
"id": "ac1dbec9a9146410"
40+
},
41+
{
42+
"metadata": {},
43+
"cell_type": "code",
44+
"outputs": [],
45+
"execution_count": null,
46+
"source": [
47+
"from ml3_platform_sdk.client import ML3PlatformClient\n",
48+
"from ml3_platform_sdk import enums as ml3_enums\n",
49+
"from ml3_platform_sdk import models as ml3_models"
50+
],
51+
"id": "c8bf440019a4ca24"
52+
},
3153
{
3254
"cell_type": "markdown",
3355
"id": "26f67a46-cacf-4ca8-b67e-43388566b3c7",
@@ -54,45 +76,21 @@
5476
"id": "c6588f7a-488c-4d3b-bc57-ec5ddbb0f955",
5577
"metadata": {},
5678
"source": [
57-
"If you don't remember the id of the projcet you can get the list of projects:\n",
79+
"If you don't remember the id of the project you can get the list of your projects:\n",
5880
"```py\n",
5981
"projects: List[Project] = client.get_projects()\n",
6082
"logger.info(f'Projects inside the company are: {projects}')\n",
6183
"```"
6284
]
6385
},
64-
{
65-
"cell_type": "code",
66-
"execution_count": null,
67-
"id": "cd94e6fe",
68-
"metadata": {},
69-
"outputs": [],
70-
"source": [
71-
"import logging\n",
72-
"logger = logging.getLogger(\"platform_tutorial\")"
73-
]
74-
},
75-
{
76-
"cell_type": "code",
77-
"execution_count": null,
78-
"id": "b808c779",
79-
"metadata": {},
80-
"outputs": [],
81-
"source": [
82-
"from ml3_platform_sdk.client import ML3PlatformClient\n",
83-
"from ml3_platform_sdk import enums as ml3_enums\n",
84-
"from ml3_platform_sdk import models as ml3_models"
85-
]
86-
},
8786
{
8887
"cell_type": "markdown",
8988
"id": "0c085046-a664-4085-b011-e4ff37d27ca5",
9089
"metadata": {},
9190
"source": [
92-
"**Instantiace the Client**\n",
91+
"**Instantiate the Client**\n",
9392
"\n",
94-
"To interact with ML cube Platform you need to instantiate the client only the first time.\n",
95-
"Then you will use its methods to perform requests.\n",
93+
"To interact with ML cube Platform, you need to instantiate the client. You will then use its methods to perform requests.\n",
9694
"Please, insert the api key we provided you to instantiate the client."
9795
]
9896
},
@@ -115,7 +113,9 @@
115113
"source": [
116114
"**Create Task**\n",
117115
"\n",
118-
"To monitor your models you need to add them in a `Task`, it represents a AI problem like regression or classification over a dataset."
116+
"To monitor your models you need to add them in a `Task`. \n",
117+
"\n",
118+
"A `Task` represents an AI problem, such as regression or binary classification, over a dataset."
119119
]
120120
},
121121
{
@@ -149,12 +149,12 @@
149149
"source": [
150150
"**Data schema**\n",
151151
"\n",
152-
"The data schema describes the data used in this task by your models.\n",
153-
"It contains features, targets and a set of mandatory metadata required by ML cube Platform for a correct function.\n",
154-
"Each sample is required to have associated a `timestamp` and an `identifier`: the timestamp is used to sort your data and the identifier to share information about data without transferring them.\n",
152+
"The data schema defines the structure of the data used in this task.\n",
153+
"It contains features, target and a set of mandatory metadata required by the ML cube Platform.\n",
154+
"Each sample must have a `timestamp` and an `identifier`: the timestamp is used to sort your data chronologically and the identifier is used to share information about data without transferring them.\n",
155155
"\n",
156-
"The data schema is specified by the class `DataSchema` that you find in the `models` package of our sdk.\n",
157-
"In the following cell you can see an example of a DataSchema, as you can notice, the model's predictions are not mentioned. That's why they will be automatically added when you create a model for the task."
156+
"The data schema is specified by the class `DataSchema` which can be found in the `models` package of our sdk.\n",
157+
"In the following cell you can see an example of a DataSchema. As you can notice, the model predictions are not included. The reason is that they will be automatically added when a model is created within the task."
158158
]
159159
},
160160
{
@@ -209,19 +209,19 @@
209209
"source": [
210210
"**Historical data**\n",
211211
"\n",
212-
"Ok, now that you inserted the data schema for your Task you are able to upload data.\n",
212+
"Now that you have inserted the data schema for your Task you are able to upload data.\n",
213213
"There are two classes of data: *historical* and *production*.\n",
214-
"Historical data represents data you had before the model was in production while, production data are data that comes from the production environment.\n",
215-
"Model reference data are selected from historical one by specifying the time range.\n",
214+
"Historical data represent data you had before the model was in production, while production data are those coming from the production environment.\n",
215+
"The reference data for the model are selected from the historical data by specifying a time range.\n",
216216
"\n",
217217
"This is the first time you send data to ML cube Platform, therefore, we have some things to explain:\n",
218218
"\n",
219219
"- data are composed of features, targets, predictions. You send each category separately since data can come from multiple sources;\n",
220-
"- the operations of sending data belong to the category of operations that runs a pipeline inside ML cube Platform. In this case the pipeline is composed only by the data step that reads the data, validate them and then if the storing policy is MLCUBE it stores inside the ML cube Platform's Secure Storage;\n",
221-
"- the pipeline is identified by a `job_id` and you can follow the execution status by asking to the client its information.\n",
220+
"- the operation of sending data belong to the category of operations that runs a pipeline inside the ML cube Platform. In this case the pipeline is composed only by the data step that reads the data, validate them and then if the storing policy is MLCUBE it stores inside the ML cube Platform's Secure Storage;\n",
221+
"- the pipeline is identified by a `job_id` and you can follow the execution status by asking the client its information. Additionally, you can wait for the completion of the job by calling the method `wait_job_completion(job_id)`.\n",
222222
"\n",
223-
"In the cell below, we sends features using `LocalDataSource` beceause we have the file locally, and we use a `GCSDataSource` for the target because we have it on the cloud.\n",
224-
"In order to use remote data sources you need to add credentials on ML cube Platform and then you specify them in the `DataSource` object."
223+
"In the cell below, we send features using a `LocalDataSource`, since we have the file locally, and we use a `GCSDataSource` for the target because we have it on the cloud.\n",
224+
"In order to use remote data sources you need to add credentials on the ML cube Platform. You can specify them in the `DataSource` object."
225225
]
226226
},
227227
{
@@ -233,14 +233,12 @@
233233
"source": [
234234
"# TO COMPLETE\n",
235235
"inputs_data_source = ml3_models.LocalDataSource(\n",
236-
" data_structure=ml3_enums.DataStructure.TABULAR,\n",
237236
" file_path=\"path/to/file.csv\",\n",
238237
" file_type=ml3_enums.FileType.CSV,\n",
239238
" is_folder=False,\n",
240239
" folder_type=None\n",
241240
")\n",
242241
"target_data_source = ml3_models.GCSDataSource(\n",
243-
" dataset_type=ml3_enums.DatasetType.TABULAR,\n",
244242
" object_path=\"gs://path/to/file.csv\",\n",
245243
" credentials_id='gcp_credentials_id',\n",
246244
" file_type=ml3_enums.FileType.CSV,\n",
@@ -271,9 +269,9 @@
271269
"**Create Model**\n",
272270
"\n",
273271
"After the task is created, you can add AI models inside it.\n",
274-
"A model is univoquely identified by the pair `name` and `version`.\n",
275-
"The version identifies a specific trained instance of the model, whenever, you retrain your model, you will update its version on ML cube Platform.\n",
276-
"The field `metric_name` represents the error or performance metric used inside ML cube Platform to show to you the statistics of the model or in the retraining report."
272+
"A model is uniquely identified by the pair `name` and `version`.\n",
273+
"The version identifies a specific trained instance of the model. Whenever you retrain your model, you will update its version on the ML cube Platform.\n",
274+
"The metric_name field indicates the error or performance metric used in the ML Cube Platform to display the model's performance and it is also included in the retraining report."
277275
]
278276
},
279277
{
@@ -287,7 +285,7 @@
287285
"model_id = ml3_client.create_model(\n",
288286
" task_id=task_id,\n",
289287
" name=\"model-name\",\n",
290-
" version=\"v0.2.1\",\n",
288+
" version=\"v0.0.1\",\n",
291289
" metric_name=ml3_enums.ModelMetricName.RMSE,\n",
292290
" preferred_suggestion_type=ml3_enums.SuggestionType.SAMPLE_WEIGHTS,\n",
293291
" with_probabilistic_output=False,\n",
@@ -302,8 +300,8 @@
302300
"source": [
303301
"**Model reference**\n",
304302
"\n",
305-
"In the previous cell you created the model but it is not complete because it misses the training dataset that in ML cube Platform is called *reference*.\n",
306-
"Here you add the reference data of the model by specifying the time range, ML cube Platform automatically select from all the previously uploaded data the reference data."
303+
"In the previous cell you created the model, but it still misses the training dataset. In the ML cube Platform, the training dataset is called *reference*.\n",
304+
"You can add the reference data of the model by specifying the time range. The ML cube Platform will automatically select from the previously uploaded data the reference data specified."
307305
]
308306
},
309307
{
@@ -336,7 +334,7 @@
336334
"source": [
337335
"**Congratulations!**\n",
338336
"\n",
339-
"In this notebook, you learned how to create a task, add a model to this task and uploading to ML cube Platform both historical and reference data."
337+
"In this notebook, you learned how to create a task, add a model to the task, and upload both historical and reference data to the ML Cube Platform."
340338
]
341339
}
342340
],

0 commit comments

Comments
 (0)