|
| 1 | +--- |
| 2 | +title: Foreacast bike sharing demand with automated ML experiment |
| 3 | +titleSuffix: Azure Machine Learning |
| 4 | +description: Learn how to train and deploy a demand forecasting with automated machine learning in Azure Machine Learning studio. |
| 5 | +services: machine-learning |
| 6 | +ms.service: machine-learning |
| 7 | +ms.subservice: core |
| 8 | +ms.topic: tutorial |
| 9 | +ms.author: sacartac |
| 10 | +ms.reviewer: nibaccam |
| 11 | +author: cartacioS |
| 12 | +ms.date: 01/27/2020 |
| 13 | + |
| 14 | +# Customer intent: As a non-coding data scientist, I want to use automated machine learning to build a demand forecasting model with built in holiday featurization. |
| 15 | +--- |
| 16 | + |
| 17 | +# Tutorial: Forecast bike sharing demand with automated machine learning |
| 18 | +[!INCLUDE [applies-to-skus](../../includes/aml-applies-to-enterprise-sku.md)] |
| 19 | + |
| 20 | +In this tutorial, you use automated machine learning, or automated ML, in the Azure Machine Learning studio to create a demand forecasting model for a bike sharing service. |
| 21 | + |
| 22 | +In this tutorial, you learn how to do the following tasks: |
| 23 | + |
| 24 | +> [!div class="checklist"] |
| 25 | +> * Create an experiment in an Azure Machine Learning workspace. |
| 26 | +> * Configure a remote run of automated ML for a time-series model with lag and holiday features. |
| 27 | +> * View the engineered names for featurized data and featurization summary for all raw features. |
| 28 | +> * Evaluate the fitted model using a rolling test. |
| 29 | +
|
| 30 | +## Prerequisites |
| 31 | + |
| 32 | +* [Create an Enterprise edition workspace](how-to-manage-workspace.md) if you don't already have an Azure Machine Learning workspace. |
| 33 | + * Automated machine learning in the Azure Machine Learning studio is only avaialble for Enterprise edition workspaces. |
| 34 | +* Download the [bike-no.csv]() data file |
| 35 | + |
| 36 | +## Set up experiment |
| 37 | + |
| 38 | +Complete the following experiment set-up and run steps in Azure Machine Learning studio, a consolidated interface that includes machine learning tools to perform data science scenarios for data science practitioners of all skill levels. The studio is not supported on Internet Explorer browsers. |
| 39 | + |
| 40 | +1. Sign in to [Azure Machine Learning studio](https://ml.azure.com). |
| 41 | + |
| 42 | +1. Select your subscription and the workspace you created. |
| 43 | + |
| 44 | +1. Select **Get started**. |
| 45 | + |
| 46 | +1. In the left pane, select **Automated ML** under the **Author** section. |
| 47 | + |
| 48 | +1. Select **+New automated ML run**. |
| 49 | + |
| 50 | +### Create and load dataset |
| 51 | + |
| 52 | +1. On the **Select dataset** form, select **From local files** from the **+Create dataset** drop-down. |
| 53 | + |
| 54 | + 1. On the **Basic info** form, give your dataset a name and provide an optional description. The dataset type should default to **Tabular**, since automated ML in Azure Machine Learning studio currently only supports TabularDatasets. |
| 55 | + |
| 56 | + 1. Select **Next** on the bottom left |
| 57 | + |
| 58 | + 1. On the **Datastore and file selection** form, select the default datastore that was automatically set up during your workspace creation, **workspaceblobstore (Azure Blob Storage)**. This is the storage location for your soon to be uploaded data file. |
| 59 | + |
| 60 | + 1. Select **Browse**. |
| 61 | + |
| 62 | + 1. Choose the **bike-no.csv** file on your local computer. This is the file you downloaded as a [prerequisite](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/bike-no.csv). |
| 63 | + |
| 64 | + 1. Select **Next** |
| 65 | + |
| 66 | + When the upload is complete, the Settings and preview form is pre-populated based on the file type. |
| 67 | + |
| 68 | + 1. Verify that the **Settings and preview** form is populated as follows and select **Next**. |
| 69 | + |
| 70 | + Field|Description| Value for tutorial |
| 71 | + ---|---|--- |
| 72 | + File format|Defines the layout and type of data stored in a file.| Delimited |
| 73 | + Delimiter|One or more characters for specifying the boundary between separate, independent regions in plain text or other data streams. |Comma |
| 74 | + Encoding|Identifies what bit to character schema table to use to read your dataset.| UTF-8 |
| 75 | + Column headers| Indicates how the headers of the dataset, if any, will be treated.| Use headers from the first file |
| 76 | + Skip rows | Indicates how many, if any, rows are skipped in the dataset.| None |
| 77 | + |
| 78 | + 1. The **Schema** form allows for further configuration of your data for this experiment. |
| 79 | + |
| 80 | + 1. For this example, choose to ignore the **Casual** and **Registered** columns. These columns are a breakdown of the **cnt** column so, therefore unneccessary. |
| 81 | + |
| 82 | + 1. Select **Next**. |
| 83 | + |
| 84 | + 1. On the **Confirm details** form, verify the information matches what was previously populated on the **Basic info** and **Settings and preview** forms. |
| 85 | + 1. Select **Create** to complete the creation of your dataset. |
| 86 | + |
| 87 | + 1. Select your dataset once it appears in the list. |
| 88 | + |
| 89 | + 1. Select **Next**. |
| 90 | + |
| 91 | +### Configure the experiment run |
| 92 | + |
| 93 | +1. Populate the **Configure Run** form as follows: |
| 94 | + 1. Enter an experiment name: `automl-bikeshare` |
| 95 | + |
| 96 | + 1. Select **cnt** as the target column, what you want to predict. This column indicates the number of total rentals. |
| 97 | + |
| 98 | + 1. Select **Create a new compute** and configure your compute target. Automated ML only supports Azure Machine Learning compute. |
| 99 | + |
| 100 | + Field | Description | Value for tutorial |
| 101 | + ----|---|--- |
| 102 | + Compute name |A unique name that identifies your compute context.|bike-compute |
| 103 | + Virtual machine size| Select the virtual machine size for your compute.|Standard_DS12_V2 |
| 104 | + Min / Max nodes (in Advanced Settings)| To profile data, you must specify 1 or more nodes.|Min nodes: 1<br>Max nodes: 6 |
| 105 | + |
| 106 | + 1. Select **Create** to get the compute target. |
| 107 | + |
| 108 | + **This takes a couple minutes to complete.** |
| 109 | + |
| 110 | + 1. After creation, select your new compute target from the drop-down list. |
| 111 | + |
| 112 | + 1. Select **Next**. |
| 113 | + |
| 114 | +1. On the **Task type and settings** form, select **Forecasting** as the machine learning task type. |
| 115 | + |
| 116 | + 1. Select **View additional configuration settings** and populate the fields as follows. These settings are to better control the training job. Otherwise, defaults are applied based on experiment selection and data. |
| 117 | + |
| 118 | + |
| 119 | + Additional configurations|Description|Value for tutorial |
| 120 | + ------|---------|--- |
| 121 | + Primary metric| Evaluation metric that the machine learning algorithm will be measured by.|Normalized root mean squared error |
| 122 | + Automatic featurization| Enables preprocessing. This includes automatic data cleansing, preparing, and transformation to generate synthetic features.| Enable |
| 123 | + Explain best model (preview)| Automatically shows explainability on the best model created by automated ML.| Enable |
| 124 | + Blocked algorithms | Algorithms you want to exclude from the training job| Extreme Random Trees |
| 125 | + Additional forecasting settings| Settings |Forecast horizon: 14 <br> Forecast target lags: None <br> Target rolling window size: None |
| 126 | + Exit criterion| If a criteria is met, the training job is stopped. |Training job time (hours): 3 <br> Metric score threshold: None |
| 127 | + Validation | Choose a cross-validation type and number of tests.|Validation type:<br> k-fold cross-validation <br> <br> Number of validations: 3 |
| 128 | + Concurrency| The maximum number of parallel iterations executed and cores used per iteration| Max concurrent iterations: 4 |
| 129 | + |
| 130 | + Select **OK**. |
| 131 | + |
| 132 | +1. Select **Create** to run the experiment. The **Run Details** screen opens with the **Run status** at the top next to the run number. This status updates as the experiment progresses. |
| 133 | + |
| 134 | +>[!IMPORTANT] |
| 135 | +> Preparation takes **10-15 minutes** to prepare the experiment run. |
| 136 | +> Once running, it takes **2-3 minutes more for each iteration**. <br> |
| 137 | +> In production, you'd likely walk away for a bit. But for this tutorial, we suggest you start exploring the tested algorithms on the **Models** tab as they complete while the others are still running. |
| 138 | +
|
| 139 | +## Explore models |
| 140 | + |
| 141 | +Navigate to the **Models** tab to see the algorithms (models) tested. By default, the models are ordered by metric score as they complete. For this tutorial, the model that scores the highest based on the chosen **Normalized root mean squared error** metric is at the top of the list. |
| 142 | + |
| 143 | +While you wait for all of the experiment models to finish, select the **Algorithm name** of a completed model to explore its performance details. |
| 144 | + |
| 145 | +The following navigates through the **Model details** and the **Visualizations** tabs to view the selected model's properties, metrics and performance charts. |
| 146 | + |
| 147 | + |
| 148 | + |
| 149 | + |
| 150 | +## Deploy the model |
| 151 | + |
| 152 | +Automated machine learning in Azure Machine Learning studio allows you to deploy the best model as a web service in a few steps. Deployment is the integration of the model so it can predict on new data and identify potential areas of opportunity. |
| 153 | + |
| 154 | +For this experiment, deployment to a web service means that the financial institution now has an iterative and scalable web solution for forecasting bikeshare customer demand. |
| 155 | + |
| 156 | +Once the run is complete, navigate back to the **Run Detail** page and select the **Models** tab. Select **Refresh**. |
| 157 | + |
| 158 | +In this experiment context, **StackEnsemble** is considered the best model, based on the **Normalized root mean squared error** metric. We deploy this model, but be advised, deployment takes about 20 minutes to complete. The deployment process entails several steps including registering the model, generating resources, and configuring them for the web service. |
| 159 | + |
| 160 | +1. Select the **Deploy Best Model** button in the bottom-left corner. |
| 161 | + |
| 162 | +1. Populate the **Deploy a model** pane as follows: |
| 163 | + |
| 164 | + Field| Value |
| 165 | + ----|---- |
| 166 | + Deployment name| my-automl-deploy |
| 167 | + Deployment description| My first automated machine learning experiment deployment |
| 168 | + Compute type | Select Azure Compute Instance (ACI) |
| 169 | + Enable authentication| Disable. |
| 170 | + Use custom deployments| Disable. Allows for the default driver file (scoring script) and environment file to be autogenerated. |
| 171 | + |
| 172 | + For this example, we use the defaults provided in the *Advanced* menu. |
| 173 | + |
| 174 | +1. Select **Deploy**. |
| 175 | + |
| 176 | + A green success message appears at the top of the **Run** screen, and |
| 177 | + in the **Recommended model** pane, a status message appears under **Deploy status**. Select **Refresh** periodically to check the deployment status. |
| 178 | + |
| 179 | +Now you have an operational web service to generate predictions. |
| 180 | + |
| 181 | +Proceed to the [**Next Steps**](#next-steps) to learn more about how to consume your new web service, and test your predictions using Power BI's built in Azure Machine Learning support. |
| 182 | + |
| 183 | +## Clean up resources |
| 184 | + |
| 185 | +Deployment files are larger than data and experiment files, so they cost more to store. Delete only the deployment files to minimize costs to your account, or if you want to keep your workspace and experiment files. Otherwise, delete the entire resource group, if you don't plan to use any of the files. |
| 186 | + |
| 187 | +### Delete the deployment instance |
| 188 | + |
| 189 | +Delete just the deployment instance from the Azure Machine Learning studio, if you want to keep the resource group and workspace for other tutorials and exploration. |
| 190 | + |
| 191 | +1. Go to the [Azure Machine Learning studio](https://ml.azure.com/). Navigate to your workspace and on the left under the **Assets** pane, select **Endpoints**. |
| 192 | + |
| 193 | +1. Select the deployment you want to delete and select **Delete**. |
| 194 | + |
| 195 | +1. Select **Proceed**. |
| 196 | + |
| 197 | +### Delete the resource group |
| 198 | + |
| 199 | +[!INCLUDE [aml-delete-resource-group](../../includes/aml-delete-resource-group.md)] |
| 200 | + |
| 201 | +## Next steps |
| 202 | + |
| 203 | +In this automated machine learning tutorial, you used Azure Machine Learning studio to create and deploy a demand forecasting model. See these articles for more information and next steps: |
| 204 | + |
| 205 | +> [!div class="nextstepaction"] |
| 206 | +> [Consume a web service](how-to-consume-web-service.md#consume-the-service-from-power-bi) |
| 207 | +
|
| 208 | + |
| 209 | +>[!NOTE] |
| 210 | +> This bike share dataset has been modified for this tutorial. This dataset was made available as part of a [Kaggle competition](https://www.kaggle.com/c/bike-sharing-demand/data) and was originally available via [Capital Bikeshare](https://www.capitalbikeshare.com/system-data). It can also be found within the [UCI Machine Learning Database](http://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset).<br><br> |
| 211 | +> Source: Fanaee-T, Hadi, and Gama, Joao, Event labeling combining ensemble detectors and background knowledge, Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg. |
0 commit comments