You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/service/how-to-ui-sample-classification-predict-churn.md
+17-17Lines changed: 17 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,47 +18,47 @@ Learn how to build a complex machine learning experiment without writing a singl
18
18
19
19
This experiment trains three, **two-class boosted decision tree** classifiers to predict common tasks for customer relationship management (CRM) systems: churn, appetency, and up-selling. The data values and labels are split across multiple data sources and scrambled to anonymize customer information, however, we can still use the visual interface to combine data sets and train a model using the scrambled values.
20
20
21
-
Because we're trying to answer the question "Which one?" this is called a classification problem. However, you can apply the same steps in this experiment to tackle any type of machine learning problem whether it be regression, classification, clustering, and so on.
21
+
Because you're trying to answer the question "Which one?" this is called a classification problem, but you can apply the same logic in this project to tackle any type of machine learning problem whether it be regression, classification, clustering, and so on.
4. Select the **Open** button for the Sample 5 experiment.
32
32
33
-

33
+

34
34
35
35
## Data
36
36
37
-
The data we use for this experiment is from KDD Cup 2009. The dataset has 50,000 rows and 230 feature columns. The task is to predict churn, appetency, and up-selling for customers who use these features. For more information about the data and the task, see the [KDD website](https://www.kdd.org/kdd-cup/view/kdd-cup-2009).
37
+
The data for this experiment is from KDD Cup 2009. It has 50,000 rows and 230 feature columns. The task is to predict churn, appetency, and up-selling for customers who use these features. For more information about the data and the task, see the [KDD website](https://www.kdd.org/kdd-cup/view/kdd-cup-2009).
38
38
39
39
## Experiment summary
40
40
41
41
This visual interface sample experiment shows binary classifier prediction of churn, appetency, and up-selling, a common task for customer relationship management (CRM).
42
42
43
-
First, we do some simple data processing.
43
+
First, do some simple data processing.
44
44
45
-
- The raw dataset contains lots of missing values. We use the **Clean Missing Data** module to replace the missing values with 0.
45
+
- The raw dataset contains lots of missing values. Use the **Clean Missing Data** module to replace the missing values with 0.
46
46
47
-

47
+

48
48
49
-
- The features and the corresponding churn, appetency, and up-selling labels are in different datasets. We use the **Add Columns** module to append the label columns to the feature columns. The first column, **Col1**, is the label column. The rest of the columns, **Var1**, **Var2**, and so on, are the feature columns.
49
+
- The features and the corresponding churn, appetency, and up-selling labels are in different datasets. Use the **Add Columns** module to append the label columns to the feature columns. The first column, **Col1**, is the label column. The rest of the columns, **Var1**, **Var2**, and so on, are the feature columns.
50
50
51
-

51
+

52
52
53
-
-We use the **Split Data** module to split the dataset into train and test sets.
53
+
-Use the **Split Data** module to split the dataset into train and test sets.
54
54
55
-
We then use the Boosted Decision Tree binary classifier with the default parameters to build the prediction models. We build one model per task, that is, one model each to predict up-selling, appetency, and churn.
55
+
Then use the Boosted Decision Tree binary classifier with the default parameters to build the prediction models. Build one model per task, that is, one model each to predict up-selling, appetency, and churn.
56
56
57
57
## Results
58
58
59
59
Visualize the output of the **Evaluate Model** module to see the performance of the model on the test set. For the up-selling task, the ROC curve shows that the model does better than a random model. The area under the curve (AUC) is 0.857. At threshold 0.5, the precision is 0.7, the recall is 0.463, and the F1 score is 0.545.
60
60
61
-

61
+

62
62
63
63
You can move the **Threshold** slider and see the metrics change for the binary classification task.
64
64
@@ -70,8 +70,8 @@ Visualize the output of the **Evaluate Model** module to see the performance of
70
70
71
71
Explore the other samples available for the visual interface:
72
72
73
-
-[Sample 1 - Regression: Predict an automobile's price](ui-sample-regression-predict-automobile-price-basic.md)
74
-
-[Sample 2 - Regression: Compare algorithms for automobile price prediction](ui-sample-regression-predict-automobile-price-compare-algorithms.md)
Copy file name to clipboardExpand all lines: articles/machine-learning/service/how-to-ui-sample-classification-predict-credit-risk-basic.md
+14-18Lines changed: 14 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,33 +16,33 @@ ms.date: 05/10/2019
16
16
17
17
Learn how to build a machine learning classifier without writing a single line of code using the visual interface. This sample trains a **two-class boosted decision tree** to predict credit risk (high or low) based on credit application information such as credit history, age, and number of credit cards.
18
18
19
-
Because we're trying to answer the question "Which one?" this is called a classification problem. However, you can apply the same fundamental process to tackle any type of machine learning problem whether it be regression, classification, clustering, and so on.
19
+
Because the question is answering "Which one?" this is called a classification problem. However, you can apply the same fundamental process to tackle any type of machine learning problem whether it be regression, classification, clustering, and so on.
20
20
21
-
Here's the completed graph for this experiment:
21
+
Here's the final experiment graph for this sample:
22
22
23
-

23
+

provides an advanced experiment that solves the same problem as this experiment. It shows how to perform _cost sensitive_ classification by using an **Execute Python Script** module and compare the performance of two binary classification algorithms. Refer to it if you want to learn more about how to build classification experiments.
provides an advanced experiment that solves the same problem as this experiment. It shows how to perform *cost sensitive* classification by using an **Execute Python Script** module and compare the performance of two binary classification algorithms. Refer to it if you want to learn more about how to build classification pipelines.
37
37
38
38
## Data
39
39
40
-
We use the German Credit Card dataset from the UC Irvine repository.
40
+
The sample uses the German Credit Card dataset from the UC Irvine repository.
41
41
The dataset contains 1,000 samples with 20 features and 1 label. Each sample represents a person. The features include numerical and categorical features. See the [UCI website](https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29) for the meaning of the categorical features. The last column is the label, which denotes the credit risk and has only two possible values: high credit risk = 2, and low credit risk = 1.
42
42
43
43
## Experiment summary
44
44
45
-
We follow these steps to create the experiment:
45
+
Follow these steps to create the experiment:
46
46
47
47
1. Drag the German Credit Card UCI Data dataset module into the experiment's canvas.
48
48
1. Add an **Edit Metadata** module so we can add meaningful names for each column.
@@ -52,13 +52,9 @@ We follow these steps to create the experiment:
52
52
1. Add a **Score Model** module and connect the **Train Model** module to it. Then add the test set (the right port of the **Split Data**) to the **Score Model**. The **Score Model** will make the predictions. You can select its output port to see the predictions and the positive class probabilities.
53
53
1. Add an **Evaluate Model** module and connect the scored dataset to its left input port. To see the evaluation results, select the output port of the **Evaluate Model** module and select **Visualize**.
54
54
55
-
Here's the complete experiment graph:
56
-
57
-

58
-
59
55
## Results
60
56
61
-

57
+

62
58
63
59
In the evaluation results, you can see that the AUC of the model is 0.776. At threshold 0.5, the precision is 0.621, the recall is 0.456, and the F1 score is 0.526.
64
60
@@ -70,8 +66,8 @@ In the evaluation results, you can see that the AUC of the model is 0.776. At th
70
66
71
67
Explore the other samples available for the visual interface:
72
68
73
-
-[Sample 1 - Regression: Predict an automobile's price](ui-sample-regression-predict-automobile-price-basic.md)
74
-
-[Sample 2 - Regression: Compare algorithms for automobile price prediction](ui-sample-regression-predict-automobile-price-compare-algorithms.md)
0 commit comments