Skip to content

Commit b2bf9a7

Browse files
committed
Change titles details
1 parent d550671 commit b2bf9a7

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

src/pages/MainPage/index.tsx

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -61,23 +61,23 @@ export const MainPage: React.FC<Props> = ({data}) => {
6161
<br/>
6262
<Typography variant="h5">Complementary tables of practices</Typography>
6363
<br/>
64-
<Typography variant="body1" align="justify">The following tables are complement to the taxonomy presented in the previous chart. These tables are organized in the ML pipeline stages proposed by <a href="https://www.microsoft.com/en-us/research/uploads/prod/2019/03/amershi-icse-2019_Software_Engineering_for_Machine_Learning.pdf">Amershi et al. (2019)</a> (<b><em>Model requirement</em></b>, <b><em> Data collection</em></b>, <b><em> Data cleaning</em></b>, <b><em> Feature engineering</em></b>, <b><em> Data labeling</em></b>, <b><em> Model training</em></b>, <b><em> Model evaluation</em></b>, <b><em> Model deployment</em></b> and <b><em> Model monitoring</em></b>) and an extra stage called <b><em> implementation</em></b>. For each stage, a brief explanation of it is given and a table with the respective practices is presented. In the Table, an indicator per practice is given (this ID match wirh the ID used in the article). In addition to the ID, the taxonomy's categories are presented with the description of the practices. Furthermore, we present extra resources, the post(s) that is related to the practices, external URL(s) related to the post, and extra urls that help to understand the practices and the ML terminology/concepts associated to them. Kindly note that below each table, you will find an explanation abou the acronyms used in each table.</Typography>
64+
<Typography variant="body1" align="justify">The following tables are complement to the taxonomy presented in the previous chart. These tables are organized in the ML pipeline stages proposed by <a href="https://www.microsoft.com/en-us/research/uploads/prod/2019/03/amershi-icse-2019_Software_Engineering_for_Machine_Learning.pdf">Amershi et al. (2019)</a> (<b><em>Model Requirement</em></b>, <b><em> Data Collection</em></b>, <b><em> Data Cleaning</em></b>, <b><em> Feature Engineering</em></b>, <b><em> Data Labeling</em></b>, <b><em> Model Training</em></b>, <b><em> Model Evaluation</em></b>, <b><em> Model Deployment</em></b> and <b><em> Model Monitoring</em></b>) and an extra stage called <b><em> Cross-cutting</em></b>. For each stage, a brief explanation of it is given and a table with the respective practices is presented. In the Table, an indicator per practice is given (this ID match wirh the ID used in the article). In addition to the ID, the taxonomy's categories are presented with the description of the practices. Furthermore, we present extra resources, the post(s) that is related to the practices, external URL(s) related to the post, and extra urls that help to understand the practices and the ML terminology/concepts associated to them. Kindly note that below each table, you will find an explanation abou the acronyms used in each table.</Typography>
6565
<br/>
66-
<Typography variant="h6" align="left"> Model requirement (MR) </Typography>
66+
<Typography variant="h6" align="left"> Model Requirement (MR) </Typography>
6767
<Typography variant="body1" align="justify"> In this stage, designers decide the functionalities that should be included in an ML system, their usefulness for new or existing products, and the most appropriate type of ML model for the expected system features <a href="https://www.microsoft.com/en-us/research/uploads/prod/2019/03/amershi-icse-2019_Software_Engineering_for_Machine_Learning.pdf">(Amershi et al. (2019))</a>. Four ML best practices were identified for this stage.</Typography>
6868
<br/>
6969
<TemplateTable data={TABLE_1} columns={TABLE_1_COLUMNS} tableHeight={540}/>
7070
<Typography variant="caption" align="justify">CRV: https://stats.stackexchange.com/q<br/>DTSC: https://datascience.stackexchange.com/q<br/>STO: https://stackoverflow.com/q</Typography>
7171
<br/>
7272
<br/>
73-
<Typography variant="h6" align="left"> Data collection (DC)</Typography>
73+
<Typography variant="h6" align="left"> Data Collection (DC)</Typography>
7474
<Typography variant="body1" align="justify"> This second stage encompasses looking for, collecting, and integrating available datasets <a href="https://www.microsoft.com/en-us/research/uploads/prod/2019/03/amershi-icse-2019_Software_Engineering_for_Machine_Learning.pdf">(Amershi et al. (2019))</a>. Datasets can be created from scratch, or existing datasets can be used to train models in a transfer learning fashion. Both scenarios are widely used when creating ML systems. In this stage, seven validated practices were identified. Bear in mind that the identified practices relate to some characteristics that the collected data has to meet during/after this process and not to the collection process itself.</Typography>
7575
<br/>
7676
<TemplateTable data={TABLE_2} columns={TABLE_2_COLUMNS} tableHeight={570}/>
7777
<Typography variant="caption" align="justify">AI: https://ai.stackexchange.com/q<br/>CRV: https://stats.stackexchange.com/q<br/>DTSC: https://datascience.stackexchange.com/q<br/>STO: https://stackoverflow.com/q</Typography>
7878
<br/>
7979
<br/>
80-
<Typography variant="h6" align="left"> Data cleaning (DCL)</Typography>
80+
<Typography variant="h6" align="left"> Data Cleaning (DCL)</Typography>
8181
<Typography variant="body1" align="justify"> This is the second stage in which the most practices were identified (i.e., 33 practices). In general, this stage involves removing inaccurate or noisy records from a dataset <a href="https://www.microsoft.com/en-us/research/uploads/prod/2019/03/amershi-icse-2019_Software_Engineering_for_Machine_Learning.pdf">(Amershi et al. (2019))</a>. In this case, we present the practices aggregated by three subcategories: <b><em>Exploratory data analysis (EDA)</em></b>, <b><em>Wrangling</em></b>, and <b><em>Data</em></b>.</Typography>
8282
<br/>
8383
<Typography variant="subtitle1" align="left" mb={1}> <em>Exploratory data analysis (EDA)</em></Typography>
@@ -101,21 +101,21 @@ export const MainPage: React.FC<Props> = ({data}) => {
101101
<Typography variant="caption" align="justify">CS: https://cs.stackexchange.com/q<br/>CRV: https://stats.stackexchange.com/q<br/>DTSC: https://datascience.stackexchange.com/q<br/>OD: https://opendata.stackexchange.com/q<br/>STO: https://stackoverflow.com/q</Typography>
102102
<br/>
103103
<br/>
104-
<Typography variant="h6" align="left"> Data labeling (DL)</Typography>
104+
<Typography variant="h6" align="left"> Data Labeling (DL)</Typography>
105105
<Typography variant="body1" align="justify"> This phase, in which a ground truth label is assigned to each sample/record of the datasets <a href="https://www.microsoft.com/en-us/research/uploads/prod/2019/03/amershi-icse-2019_Software_Engineering_for_Machine_Learning.pdf">(Amershi et al. (2019))</a>, is not always required since some ML approaches do not need it. In particular, ground truth is needed when working with projects that use supervised or semi-supervised learning but is not needed for projects that use unsupervised learning. For instance, if a snippet of code is going to be classified as vulnerable or not, then for each snippet of code, a label indicating if it is vulnerable or not should be assigned. Two practices were identified in this stage. The first practice in this group was validated by all the experts (<em>DL1</em>), while the second practice (<em>DL2</em>) was validated by three of them. </Typography>
106106
<br/>
107107
<TemplateTable data={TABLE_6} columns={TABLE_6_COLUMNS} tableHeight={300}/>
108108
<Typography variant="caption" align="justify">DTSC: https://datascience.stackexchange.com/q<br/>STO: https://stackoverflow.com/q</Typography>
109109
<br/>
110110
<br/>
111-
<Typography variant="h6" align="left"> Feature engineering (FE)</Typography>
111+
<Typography variant="h6" align="left"> Feature Engineering (FE)</Typography>
112112
<Typography variant="body1" align="justify"> This stage of an ML pipeline involves all the activities that are performed to extract and select informative features (i.e., characteristics/attributes that are useful or relevant) for machine learning models <a href="https://www.microsoft.com/en-us/research/uploads/prod/2019/03/amershi-icse-2019_Software_Engineering_for_Machine_Learning.pdf">(Amershi et al. (2019))</a>. In this stage, 11 validated practices were identified, four of them (<em>FE1</em> - <em>FE4</em>) were validated by the four experts and the remaining six (<em>FE5</em> - <em>FE11</em>) were validated by three experts.</Typography>
113113
<br/>
114114
<TemplateTable data={TABLE_7} columns={TABLE_7_COLUMNS} tableHeight={600}/>
115115
<Typography variant="caption" align="justify">CRV: https://stats.stackexchange.com/q<br/>DTSC: https://datascience.stackexchange.com/q<br/>STO: https://stackoverflow.com/q</Typography>
116116
<br/>
117117
<br/>
118-
<Typography variant="h6" align="left"> Model training (MT)</Typography>
118+
<Typography variant="h6" align="left"> Model Training (MT)</Typography>
119119
<Typography variant="body1" align="justify"> This is the ML pipeline stage with the largest number of validated practices, 47 in total. In this stage, machine learning models are trained and tuned using the selected features in the fe stage, and the labels created/selected during the dl stage <a href="https://www.microsoft.com/en-us/research/uploads/prod/2019/03/amershi-icse-2019_Software_Engineering_for_Machine_Learning.pdf">(Amershi et al. (2019))</a>, if applicable. To facilitate the reading of this subsection, the practices are grouped into two subcategories, a Learning phase, and a Validation phase. In each subcategory, we present, first, all the practices that were validated by all the four experts, followed by those that were validated by three experts. Note that validation refers to the usage of a validation set in order to optimize hyper-parameters; validation, in this case, is not related to testing an already trained and tuned model.</Typography>
120120
<br/>
121121
<Typography variant="subtitle1" align="left" mb={1}> <em>Learning Phase</em></Typography>
@@ -132,19 +132,19 @@ export const MainPage: React.FC<Props> = ({data}) => {
132132
<Typography variant="caption" align="justify">CRV: https://stats.stackexchange.com/q<br/>DTSC: https://datascience.stackexchange.com/q<br/>STO: https://stackoverflow.com/q</Typography>
133133
<br/>
134134
<br/>
135-
<Typography variant="h6" align="left"> Model evaluation (ME)</Typography>
135+
<Typography variant="h6" align="left"> Model Evaluation (ME)</Typography>
136136
<Typography variant="body1" align="justify"> In the model evaluation stage, trained and tuned models are tested. For instance, engineers evaluate the output models on tested or safeguard datasets by using pre-defined metrics. In particular cases, for critical domains (e.g., safety-critical applica- tions from the medical domain), this stage involves human evaluation <a href="https://www.microsoft.com/en-us/research/uploads/prod/2019/03/amershi-icse-2019_Software_Engineering_for_Machine_Learning.pdf">(Amershi et al. (2019))</a>. For this stage, we have a few practices, eight, that are related to model evaluations. However, some other practices that involve or are associated with model evaluation/testing were mentioned before as part of other stages. All the experts validated two practices (<em>ME1</em> - <em>ME2</em>), and six (<em>ME3</em> - <em>ME8</em>) were validated by three experts. </Typography>
137137
<TemplateTable data={TABLE_10} columns={TABLE_10_COLUMNS} tableHeight={400}/>
138138
<Typography variant="caption" align="justify">CS: https://cs.stackexchange.com/q<br/>CRV: https://stats.stackexchange.com/q<br/>DTSC: https://datascience.stackexchange.com/q<br/>STO: https://stackoverflow.com/q</Typography>
139139
<br/>
140140
<br/>
141-
<Typography variant="h6" align="left"> Model deployment (MD)</Typography>
141+
<Typography variant="h6" align="left"> Model Deployment (MD)</Typography>
142142
<Typography variant="body1" align="justify"> Note that, in this stage, the inference code (i.e., the code used to train/test/validate a model) of the previously trained, tuned, and tested model is deployed on a production setup <a href="https://www.microsoft.com/en-us/research/uploads/prod/2019/03/amershi-icse-2019_Software_Engineering_for_Machine_Learning.pdf">(Amershi et al. (2019))</a>. Two practices were identified in this stage and validated by the four experts. </Typography>
143143
<TemplateTable data={TABLE_11} columns={TABLE_11_COLUMNS} tableHeight={280}/>
144144
<Typography variant="caption" align="justify">CRV: https://stats.stackexchange.com/q</Typography>
145145
<br/>
146146
<br/>
147-
<Typography variant="h6" align="left"> Model monitoring (MM)</Typography>
147+
<Typography variant="h6" align="left"> Model Monitoring (MM)</Typography>
148148
<Typography variant="body1" align="justify"> In the last but not least stage of the ML pipeline, models are continuously monitored for possible errors while being executed in the real world <a href="https://www.microsoft.com/en-us/research/uploads/prod/2019/03/amershi-icse-2019_Software_Engineering_for_Machine_Learning.pdf">(Amershi et al. (2019))</a>. For this stage, two practices related to data deviations were validated by all the experts. </Typography>
149149
<TemplateTable data={TABLE_12} columns={TABLE_12_COLUMNS} tableHeight={320}/>
150150
<Typography variant="caption" align="justify">DTSC: https://datascience.stackexchange.com/q</Typography>

0 commit comments

Comments
 (0)