Skip to content

Commit a9ab32b

Browse files
Merge pull request #71 from LittleLittleCloud/u/xiaoyun/tuner
add AutoML - HPO and tuner notebook
2 parents 8866cc9 + 67260b0 commit a9ab32b

File tree

1 file changed

+325
-0
lines changed

1 file changed

+325
-0
lines changed
Lines changed: 325 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,325 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {
6+
"dotnet_interactive": {
7+
"language": "csharp"
8+
}
9+
},
10+
"source": [
11+
"Hyper Parameter Optimization, aka HPO, is to find a well-performed hyper-parameter on a given search space. The most well-known HPO is grid-search but it only performs well on tiny search space. To resolve hpo on large search space, a lot of algorithms are applied. For example, bayesian optimization is designed for optimizing expensive, black box functions which is very suitable for hpo task. Cost-Frugal optimization on the other hand, taking the training cost into consideration and is aimed to find a better solution within limited cost.\n",
12+
"\n",
13+
"One thing to note is even though hpo is a very activate research field and a lot of algorithms have been invented in the last few years, there's still lacking a general, all-in-one hpo alogrithm that performs well on all datasets. So the best way to find out the right hpo algorithm is always try different hpos on your dataset.\n",
14+
"\n",
15+
"AutoML.Net provides several hpos for you to try out, and you can configure and replace different hpos easily for `AutoMLExperiment` via setting different tuner. In this notebook, we'll go through the following topics.\n",
16+
"- Available tuners in AutoML.Net, and how to use it.\n",
17+
"- Comparing the performance for those tuners."
18+
]
19+
},
20+
{
21+
"cell_type": "code",
22+
"execution_count": null,
23+
"metadata": {
24+
"dotnet_interactive": {
25+
"language": "csharp"
26+
},
27+
"vscode": {
28+
"languageId": "dotnet-interactive.csharp"
29+
}
30+
},
31+
"outputs": [],
32+
"source": [
33+
"// install dependencies and import using statement\n",
34+
"#i \"nuget:https://pkgs.dev.azure.com/dnceng/public/_packaging/MachineLearning/nuget/v3/index.json\"\n",
35+
"#r \"nuget: Plotly.NET.Interactive, 3.0.2\"\n",
36+
"#r \"nuget: Plotly.NET.CSharp, 0.0.1\"\n",
37+
"\n",
38+
"// make sure you are using Microsoft.ML.AutoML later than 0.20.0.\n",
39+
"#r \"nuget: Microsoft.ML.AutoML, 0.20.0-preview.22514.1\"\n",
40+
"#r \"nuget: Microsoft.Data.Analysis, 0.20.0-preview.22514.1\"\n",
41+
"// Import usings.\n",
42+
"using System;\n",
43+
"using System.IO;\n",
44+
"using System.Net;\n",
45+
"using Microsoft.ML;\n",
46+
"using Microsoft.ML.AutoML;\n",
47+
"using Microsoft.ML.Data;\n",
48+
"using Microsoft.ML.SearchSpace;\n",
49+
"using Newtonsoft.Json;\n",
50+
"using Microsoft.ML.AutoML.CodeGen;\n",
51+
"using Microsoft.Data.Analysis;\n",
52+
"using Microsoft.ML.SearchSpace.Option;"
53+
]
54+
},
55+
{
56+
"cell_type": "markdown",
57+
"metadata": {},
58+
"source": [
59+
"### Available Tuners in AutoML.Net\n",
60+
"For now, those tuners are available in AutoML.Net\n",
61+
"- CostFrugalTuner: low-cost HPO algorithm, this is an implementation of [Frugal Optimization for Cost-related Hyperparameters](https://arxiv.org/abs/2005.01571).\n",
62+
"- SMAC: Bayesian optimziation using random forest as regression model.\n",
63+
"- EciCostFrugalTuner: CostFrugalTuner for hierarchical search space. This will be used as default tuner if `AutoMLExperiment.SetPipeline` get called.\n",
64+
"- GridSearch\n",
65+
"- RandomSearch\n",
66+
"\n",
67+
"The following section shows how to use different tuner in `AutoMLExperiment`."
68+
]
69+
},
70+
{
71+
"cell_type": "code",
72+
"execution_count": null,
73+
"metadata": {
74+
"dotnet_interactive": {
75+
"language": "csharp"
76+
},
77+
"vscode": {
78+
"languageId": "dotnet-interactive.csharp"
79+
}
80+
},
81+
"outputs": [],
82+
"source": [
83+
"var context = new MLContext(1);\n",
84+
"var experiment = context.Auto().CreateExperiment();\n",
85+
"\n",
86+
"// use EciCostFrugalTuner\n",
87+
"// Note: EciCostFrugalTuner will be set as default tuner if you call \n",
88+
"// experiment.SetPipeline()\n",
89+
"experiment.SetEciCostFrugalTuner();\n",
90+
"\n",
91+
"// use CostFrugalTuner\n",
92+
"experiment.SetCostFrugalTuner();\n",
93+
"\n",
94+
"// use SMAC\n",
95+
"experiment.SetSmacTuner();\n",
96+
"\n",
97+
"// use GridSearch\n",
98+
"experiment.SetGridSearchTuner(step: 10);\n",
99+
"\n",
100+
"// use RandomSearch\n",
101+
"experiment.SetRandomSearchTuner(seed: 1);"
102+
]
103+
},
104+
{
105+
"cell_type": "markdown",
106+
"metadata": {},
107+
"source": [
108+
"### Compare GridSearch and EciCostFrugal on titanic dataset\n",
109+
"\n",
110+
"The following section shows how different hpo effect automl performance, by comparing metric trend from GridSearch and EciCostFrugal on titanic dataset."
111+
]
112+
},
113+
{
114+
"cell_type": "markdown",
115+
"metadata": {
116+
"dotnet_interactive": {
117+
"language": "csharp"
118+
}
119+
},
120+
"source": [
121+
"## Download titanic if necessary"
122+
]
123+
},
124+
{
125+
"cell_type": "code",
126+
"execution_count": null,
127+
"metadata": {
128+
"dotnet_interactive": {
129+
"language": "csharp"
130+
},
131+
"vscode": {
132+
"languageId": "dotnet-interactive.csharp"
133+
}
134+
},
135+
"outputs": [],
136+
"source": [
137+
"string EnsureDataSetDownloaded(string fileName)\n",
138+
"{\n",
139+
"\n",
140+
"\t// This is the path if the repo has been checked out.\n",
141+
"\tvar filePath = Path.Combine(Directory.GetCurrentDirectory(),\"data\", fileName);\n",
142+
"\n",
143+
"\tif (!File.Exists(filePath))\n",
144+
"\t{\n",
145+
"\t\t// This is the path if the file has already been downloaded.\n",
146+
"\t\tfilePath = Path.Combine(Directory.GetCurrentDirectory(), fileName);\n",
147+
"\t}\n",
148+
"\n",
149+
"\tif (!File.Exists(filePath))\n",
150+
"\t{\n",
151+
"\t\tusing (var client = new WebClient())\n",
152+
"\t\t{\n",
153+
"\t\t\tclient.DownloadFile($\"https://raw.githubusercontent.com/dotnet/csharp-notebooks/main/machine-learning/data/{fileName}\", filePath);\n",
154+
"\t\t}\n",
155+
"\t\tConsole.WriteLine($\"Downloaded {fileName} to : {filePath}\");\n",
156+
"\t}\n",
157+
"\telse\n",
158+
"\t{\n",
159+
"\t\tConsole.WriteLine($\"{fileName} found here: {filePath}\");\n",
160+
"\t}\n",
161+
"\n",
162+
"\treturn filePath;\n",
163+
"}"
164+
]
165+
},
166+
{
167+
"cell_type": "markdown",
168+
"metadata": {},
169+
"source": [
170+
"### Load Dataset"
171+
]
172+
},
173+
{
174+
"cell_type": "code",
175+
"execution_count": null,
176+
"metadata": {
177+
"dotnet_interactive": {
178+
"language": "csharp"
179+
},
180+
"vscode": {
181+
"languageId": "dotnet-interactive.csharp"
182+
}
183+
},
184+
"outputs": [],
185+
"source": [
186+
"var trainDataPath = EnsureDataSetDownloaded(\"titanic-train.csv\");\n",
187+
"var df = DataFrame.LoadCsv(trainDataPath);\n",
188+
"\n",
189+
"var trainTestSplit = context.Data.TrainTestSplit(df, 0.1);\n",
190+
"df.Head(10)"
191+
]
192+
},
193+
{
194+
"cell_type": "markdown",
195+
"metadata": {},
196+
"source": [
197+
"### Construct pipeline and AutoMLExperiment"
198+
]
199+
},
200+
{
201+
"cell_type": "code",
202+
"execution_count": null,
203+
"metadata": {
204+
"dotnet_interactive": {
205+
"language": "csharp"
206+
},
207+
"vscode": {
208+
"languageId": "dotnet-interactive.csharp"
209+
}
210+
},
211+
"outputs": [],
212+
"source": [
213+
"var pipeline = context.Auto().Featurizer(df, excludeColumns: new[]{\"Survived\"})\n",
214+
" .Append(context.Transforms.Conversion.ConvertType(\"Survived\", \"Survived\", DataKind.Boolean))\n",
215+
"\t\t\t\t\t .Append(context.Auto().BinaryClassification(labelColumnName: \"Survived\"));\n",
216+
"// Configure AutoML\n",
217+
"var monitor = new NotebookMonitor(pipeline);\n",
218+
"\n",
219+
"var experiment = context.Auto().CreateExperiment()\n",
220+
" .SetPipeline(pipeline)\n",
221+
" .SetTrainingTimeInSeconds(10)\n",
222+
" .SetDataset(trainTestSplit.TrainSet, trainTestSplit.TestSet)\n",
223+
" .SetBinaryClassificationMetric(BinaryClassificationMetric.Accuracy, \"Survived\", \"PredictedLabel\")\n",
224+
" .SetMonitor(monitor);\n"
225+
]
226+
},
227+
{
228+
"cell_type": "markdown",
229+
"metadata": {},
230+
"source": [
231+
"### Run HPO using GridSearch"
232+
]
233+
},
234+
{
235+
"cell_type": "code",
236+
"execution_count": null,
237+
"metadata": {
238+
"dotnet_interactive": {
239+
"language": "csharp"
240+
},
241+
"vscode": {
242+
"languageId": "dotnet-interactive.csharp"
243+
}
244+
},
245+
"outputs": [],
246+
"source": [
247+
"experiment.SetGridSearchTuner(step: 10);\n",
248+
"await experiment.RunAsync();\n",
249+
"var gridSearchTrial = monitor.CompletedTrials.ToArray();\n",
250+
"monitor.CompletedTrials.Clear();"
251+
]
252+
},
253+
{
254+
"cell_type": "markdown",
255+
"metadata": {},
256+
"source": [
257+
"### Run HPO using EciCostFrugal"
258+
]
259+
},
260+
{
261+
"cell_type": "code",
262+
"execution_count": null,
263+
"metadata": {
264+
"dotnet_interactive": {
265+
"language": "csharp"
266+
},
267+
"vscode": {
268+
"languageId": "dotnet-interactive.csharp"
269+
}
270+
},
271+
"outputs": [],
272+
"source": [
273+
"experiment.SetEciCostFrugalTuner();\n",
274+
"await experiment.RunAsync();\n",
275+
"var eciSearchTrials = monitor.CompletedTrials.ToArray();\n",
276+
"monitor.CompletedTrials.Clear();"
277+
]
278+
},
279+
{
280+
"cell_type": "markdown",
281+
"metadata": {},
282+
"source": [
283+
"### Compare HPO performace among GridSearch, EciCostFrugal"
284+
]
285+
},
286+
{
287+
"cell_type": "code",
288+
"execution_count": null,
289+
"metadata": {
290+
"dotnet_interactive": {
291+
"language": "csharp"
292+
},
293+
"vscode": {
294+
"languageId": "dotnet-interactive.csharp"
295+
}
296+
},
297+
"outputs": [],
298+
"source": [
299+
"using Plotly.NET;\n",
300+
"\n",
301+
"var gridSearchChart = Chart2D.Chart.Line<int, float, string>(gridSearchTrial.Select(t => t.TrialSettings.TrialId), gridSearchTrial.Select(t => (float)t.Metric), Name: \"grid_search\");\n",
302+
"var eciCfoSearchChart = Chart2D.Chart.Line<int, float, string>(eciSearchTrials.Select(t => t.TrialSettings.TrialId), eciSearchTrials.Select(t => (float)t.Metric), Name: \"eci_cfo\");\n",
303+
"var combineChart = Chart.Combine(new[]{ gridSearchChart, eciCfoSearchChart});\n",
304+
"combineChart.Display()"
305+
]
306+
}
307+
],
308+
"metadata": {
309+
"kernelspec": {
310+
"display_name": ".NET (C#)",
311+
"language": "C#",
312+
"name": ".net-csharp"
313+
},
314+
"language_info": {
315+
"file_extension": ".cs",
316+
"mimetype": "text/x-csharp",
317+
"name": "C#",
318+
"pygments_lexer": "csharp",
319+
"version": "9.0"
320+
},
321+
"orig_nbformat": 4
322+
},
323+
"nbformat": 4,
324+
"nbformat_minor": 2
325+
}

0 commit comments

Comments
 (0)