|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "id": "current-technology", |
| 6 | + "metadata": {}, |
| 7 | + "source": [ |
| 8 | + "# Leveraging MLflow with SASCTL and Model Manager for SKLearn\n", |
| 9 | + "[MLflow](https://mlflow.org/) is an open-source platform used to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry. \n", |
| 10 | + "\n", |
| 11 | + "While MLflow and Model Manager overlap in functionality, there are places where MLflow can strengthen Model Manager. For example, by leveraging MLflow, Model Manager can better support various complex model architectures. We will continue to make additions to our SASCTL integrations with MLflow, but currently we support models developed in sklearn, statsmodel, scipy, and numpy.\n", |
| 12 | + "\n", |
| 13 | + "In this notebook, we will push a model generated in MLflow into the Model Manager registry.\n", |
| 14 | + "***\n", |
| 15 | + "## Getting Started\n", |
| 16 | + "To import MLflow models into SAS Model Manager, there are a few lines that need to be included in the MLflow script. First, include the infer_signature function in the import statements. We will need to include the signature inference after any parameter logging is defined and include a signature argument in the model logging.\n" |
| 17 | + ] |
| 18 | + }, |
| 19 | + { |
| 20 | + "cell_type": "code", |
| 21 | + "execution_count": 31, |
| 22 | + "id": "analyzed-wesley", |
| 23 | + "metadata": {}, |
| 24 | + "outputs": [], |
| 25 | + "source": [ |
| 26 | + "from mlflow.models.signature import infer_signature" |
| 27 | + ] |
| 28 | + }, |
| 29 | + { |
| 30 | + "cell_type": "markdown", |
| 31 | + "id": "clean-reservation", |
| 32 | + "metadata": {}, |
| 33 | + "source": [ |
| 34 | + "Next, adjust any data columns which are not valid Python variable names." |
| 35 | + ] |
| 36 | + }, |
| 37 | + { |
| 38 | + "cell_type": "code", |
| 39 | + "execution_count": 32, |
| 40 | + "id": "unnecessary-library", |
| 41 | + "metadata": {}, |
| 42 | + "outputs": [], |
| 43 | + "source": [ |
| 44 | + "import pandas as pd\n", |
| 45 | + "data = pd.read_csv('./data/hmeq.csv')\n", |
| 46 | + "data.columns = data.columns.str.replace('\\W|^(?=\\d)', '_', regex=True)" |
| 47 | + ] |
| 48 | + }, |
| 49 | + { |
| 50 | + "cell_type": "markdown", |
| 51 | + "id": "increasing-scottish", |
| 52 | + "metadata": {}, |
| 53 | + "source": [ |
| 54 | + "***\n", |
| 55 | + "## Building a Model\n", |
| 56 | + "Next, let's build a logistic regression. First, we will prepare our data. " |
| 57 | + ] |
| 58 | + }, |
| 59 | + { |
| 60 | + "cell_type": "code", |
| 61 | + "execution_count": 33, |
| 62 | + "id": "greenhouse-chase", |
| 63 | + "metadata": {}, |
| 64 | + "outputs": [], |
| 65 | + "source": [ |
| 66 | + "# Impute missing values \n", |
| 67 | + "data = data.fillna(value={'MORTDUE': 65019, 'VALUE': 89235, 'YOJ': 7, 'DEROG': 0, 'DELINQ': 0, 'CLAGE': 173, 'NINQ': 1, 'CLNO': 20, 'DEBTINC': 35})\n", |
| 68 | + "\n", |
| 69 | + "# One-hot-encode job\n", |
| 70 | + "one_hot_job = pd.get_dummies(data[\"JOB\"], prefix = \"JOB\", drop_first=True)\n", |
| 71 | + "data = data.join(one_hot_job)\n", |
| 72 | + "data = data.drop('JOB', axis = 1)\n", |
| 73 | + "\n", |
| 74 | + "# One-hot-encode reason\n", |
| 75 | + "one_hot_reason = pd.get_dummies(data[\"REASON\"], prefix = \"REASON\", drop_first=True)\n", |
| 76 | + "data = data.join(one_hot_reason)\n", |
| 77 | + "data = data.drop('REASON', axis = 1)\n", |
| 78 | + "\n", |
| 79 | + "# Separate target \n", |
| 80 | + "y = data.pop('BAD').values" |
| 81 | + ] |
| 82 | + }, |
| 83 | + { |
| 84 | + "cell_type": "markdown", |
| 85 | + "id": "expressed-window", |
| 86 | + "metadata": {}, |
| 87 | + "source": [ |
| 88 | + "Next, we will build our SKLearn model. " |
| 89 | + ] |
| 90 | + }, |
| 91 | + { |
| 92 | + "cell_type": "code", |
| 93 | + "execution_count": 34, |
| 94 | + "id": "mighty-positive", |
| 95 | + "metadata": {}, |
| 96 | + "outputs": [], |
| 97 | + "source": [ |
| 98 | + "model = LogisticRegression().fit(data, y)" |
| 99 | + ] |
| 100 | + }, |
| 101 | + { |
| 102 | + "cell_type": "markdown", |
| 103 | + "id": "native-seller", |
| 104 | + "metadata": {}, |
| 105 | + "source": [ |
| 106 | + "Now, let’s generate our signature. For this simple example, I’m assuming that this model will not encounter missing values, so I am ignoring MLflow’s warning about missing values. " |
| 107 | + ] |
| 108 | + }, |
| 109 | + { |
| 110 | + "cell_type": "code", |
| 111 | + "execution_count": 35, |
| 112 | + "id": "prescription-gabriel", |
| 113 | + "metadata": {}, |
| 114 | + "outputs": [], |
| 115 | + "source": [ |
| 116 | + "import warnings\n", |
| 117 | + "warnings.filterwarnings(\"ignore\")\n", |
| 118 | + "\n", |
| 119 | + "signature = infer_signature(data, model.predict(data))" |
| 120 | + ] |
| 121 | + }, |
| 122 | + { |
| 123 | + "cell_type": "markdown", |
| 124 | + "id": "cardiac-entrance", |
| 125 | + "metadata": {}, |
| 126 | + "source": [ |
| 127 | + "Finally, let’s log our MLflow model and include our signature. " |
| 128 | + ] |
| 129 | + }, |
| 130 | + { |
| 131 | + "cell_type": "code", |
| 132 | + "execution_count": 36, |
| 133 | + "id": "legislative-quality", |
| 134 | + "metadata": {}, |
| 135 | + "outputs": [ |
| 136 | + { |
| 137 | + "name": "stdout", |
| 138 | + "output_type": "stream", |
| 139 | + "text": [ |
| 140 | + "Score: 0.803020134228188\n", |
| 141 | + "Model saved in run 60f04adfcf274928bf24769f90f97741\n" |
| 142 | + ] |
| 143 | + } |
| 144 | + ], |
| 145 | + "source": [ |
| 146 | + "import mlflow\n", |
| 147 | + "\n", |
| 148 | + "score = model.score(data, y)\n", |
| 149 | + "\n", |
| 150 | + "print(\"Score: %s\" % score)\n", |
| 151 | + "mlflow.log_metric(\"score\", score)\n", |
| 152 | + "\n", |
| 153 | + "mlflow.sklearn.log_model(model, \"model\", signature=signature)\n", |
| 154 | + "print(\"Model saved in run %s\" % mlflow.active_run().info.run_uuid)\n" |
| 155 | + ] |
| 156 | + }, |
| 157 | + { |
| 158 | + "cell_type": "markdown", |
| 159 | + "id": "statewide-momentum", |
| 160 | + "metadata": {}, |
| 161 | + "source": [ |
| 162 | + "## Register Model\n", |
| 163 | + "Now, let’s use SASCTL to register our MLflow SKLearn model. First, let’s install the necessary packages. " |
| 164 | + ] |
| 165 | + }, |
| 166 | + { |
| 167 | + "cell_type": "code", |
| 168 | + "execution_count": 37, |
| 169 | + "id": "declared-beach", |
| 170 | + "metadata": {}, |
| 171 | + "outputs": [], |
| 172 | + "source": [ |
| 173 | + "# Pathing support\n", |
| 174 | + "from pathlib import Path\n", |
| 175 | + "\n", |
| 176 | + "# sasctl interface for importing models\n", |
| 177 | + "import sasctl.pzmm as pzmm \n", |
| 178 | + "from sasctl import Session" |
| 179 | + ] |
| 180 | + }, |
| 181 | + { |
| 182 | + "cell_type": "markdown", |
| 183 | + "id": "burning-ideal", |
| 184 | + "metadata": {}, |
| 185 | + "source": [ |
| 186 | + "And point SASCTL to the MLflow model files. " |
| 187 | + ] |
| 188 | + }, |
| 189 | + { |
| 190 | + "cell_type": "code", |
| 191 | + "execution_count": 38, |
| 192 | + "id": "yellow-trade", |
| 193 | + "metadata": {}, |
| 194 | + "outputs": [], |
| 195 | + "source": [ |
| 196 | + "mlPath = Path('./mlruns/0/60f04adfcf274928bf24769f90f97741/artifacts/model')\n", |
| 197 | + "varDict, inputsDict, outputsDict = pzmm.MLFlowModel.readMLmodelFile(_, mlPath)" |
| 198 | + ] |
| 199 | + }, |
| 200 | + { |
| 201 | + "cell_type": "markdown", |
| 202 | + "id": "ordered-gentleman", |
| 203 | + "metadata": {}, |
| 204 | + "source": [ |
| 205 | + "Next, let’s create a folder for our SASCTL assets and pickle our model. " |
| 206 | + ] |
| 207 | + }, |
| 208 | + { |
| 209 | + "cell_type": "code", |
| 210 | + "execution_count": 39, |
| 211 | + "id": "funded-killer", |
| 212 | + "metadata": {}, |
| 213 | + "outputs": [], |
| 214 | + "source": [ |
| 215 | + "modelPrefix = 'MLFlowDemo'\n", |
| 216 | + "zipFolder = Path.cwd() / 'outputs/mlflow_logreg'\n", |
| 217 | + "pzmm.PickleModel.pickleTrainedModel(_, _, modelPrefix, zipFolder, mlFlowDetails=varDict)" |
| 218 | + ] |
| 219 | + }, |
| 220 | + { |
| 221 | + "cell_type": "markdown", |
| 222 | + "id": "dedicated-latex", |
| 223 | + "metadata": {}, |
| 224 | + "source": [ |
| 225 | + "We can leverage the information from MLflow to generate metadata files for SASCTL. " |
| 226 | + ] |
| 227 | + }, |
| 228 | + { |
| 229 | + "cell_type": "code", |
| 230 | + "execution_count": 40, |
| 231 | + "id": "humanitarian-constant", |
| 232 | + "metadata": {}, |
| 233 | + "outputs": [ |
| 234 | + { |
| 235 | + "name": "stdout", |
| 236 | + "output_type": "stream", |
| 237 | + "text": [ |
| 238 | + "inputVar.json was successfully written and saved to /opt/python/Sophia/Demos/outputs/mlflow_logreg/inputVar.json\n", |
| 239 | + "outputVar.json was successfully written and saved to /opt/python/Sophia/Demos/outputs/mlflow_logreg/outputVar.json\n" |
| 240 | + ] |
| 241 | + } |
| 242 | + ], |
| 243 | + "source": [ |
| 244 | + "J = pzmm.JSONFiles()\n", |
| 245 | + "J.writeVarJSON(inputsDict, isInput=True, jPath=zipFolder)\n", |
| 246 | + "J.writeVarJSON(outputsDict, isInput=False, jPath=zipFolder)" |
| 247 | + ] |
| 248 | + }, |
| 249 | + { |
| 250 | + "cell_type": "code", |
| 251 | + "execution_count": 41, |
| 252 | + "id": "mental-allergy", |
| 253 | + "metadata": {}, |
| 254 | + "outputs": [ |
| 255 | + { |
| 256 | + "name": "stdout", |
| 257 | + "output_type": "stream", |
| 258 | + "text": [ |
| 259 | + "ModelProperties.json was successfully written and saved to /opt/python/Sophia/Demos/outputs/mlflow_logreg/ModelProperties.json\n", |
| 260 | + "fileMetaData.json was successfully written and saved to /opt/python/Sophia/Demos/outputs/mlflow_logreg/fileMetaData.json\n" |
| 261 | + ] |
| 262 | + } |
| 263 | + ], |
| 264 | + "source": [ |
| 265 | + "# Write model properties to a json file\n", |
| 266 | + "J.writeModelPropertiesJSON(modelName=modelPrefix,\n", |
| 267 | + " modelDesc='MLFlow Model ',\n", |
| 268 | + " targetVariable='BAD',\n", |
| 269 | + " modelType='Logistic Regression',\n", |
| 270 | + " modelPredictors='',\n", |
| 271 | + " targetEvent=1,\n", |
| 272 | + " numTargetCategories=1,\n", |
| 273 | + " eventProbVar='tensor',\n", |
| 274 | + " jPath=zipFolder,\n", |
| 275 | + " modeler='sasdemo')\n", |
| 276 | + "\n", |
| 277 | + "# Write model metadata to a json file\n", |
| 278 | + "J.writeFileMetadataJSON(modelPrefix, jPath=zipFolder)" |
| 279 | + ] |
| 280 | + }, |
| 281 | + { |
| 282 | + "cell_type": "markdown", |
| 283 | + "id": "incorporated-pacific", |
| 284 | + "metadata": {}, |
| 285 | + "source": [ |
| 286 | + "We have generated our metadata and modeling assets. Next, we will need our SAS Viya host, username, and password to create a session within SASCTL." |
| 287 | + ] |
| 288 | + }, |
| 289 | + { |
| 290 | + "cell_type": "code", |
| 291 | + "execution_count": 42, |
| 292 | + "id": "painful-bracelet", |
| 293 | + "metadata": {}, |
| 294 | + "outputs": [ |
| 295 | + { |
| 296 | + "name": "stdout", |
| 297 | + "output_type": "stream", |
| 298 | + "text": [ |
| 299 | + "Username: ········\n", |
| 300 | + "Password: ········\n", |
| 301 | + "Hostname: ········\n" |
| 302 | + ] |
| 303 | + } |
| 304 | + ], |
| 305 | + "source": [ |
| 306 | + "import getpass\n", |
| 307 | + "username = getpass.getpass(\"Username: \")\n", |
| 308 | + "password = getpass.getpass(\"Password: \")\n", |
| 309 | + "host = getpass.getpass(\"Hostname: \")\n", |
| 310 | + "sess = Session(host, username, password, protocol='http')" |
| 311 | + ] |
| 312 | + }, |
| 313 | + { |
| 314 | + "cell_type": "markdown", |
| 315 | + "id": "charming-excess", |
| 316 | + "metadata": {}, |
| 317 | + "source": [ |
| 318 | + "We can use our session to push our modeling assets into Model Manager. " |
| 319 | + ] |
| 320 | + }, |
| 321 | + { |
| 322 | + "cell_type": "code", |
| 323 | + "execution_count": 43, |
| 324 | + "id": "yellow-playing", |
| 325 | + "metadata": {}, |
| 326 | + "outputs": [ |
| 327 | + { |
| 328 | + "name": "stdout", |
| 329 | + "output_type": "stream", |
| 330 | + "text": [ |
| 331 | + "Model score code was written successfully to /opt/python/Sophia/Demos/outputs/mlflow_logreg/MLFlowDemoScore.py.\n", |
| 332 | + "All model files were zipped to /opt/python/Sophia/Demos/outputs/mlflow_logreg.\n", |
| 333 | + "Model was successfully imported into SAS Model Manager as MLFlowDemo with UUID: f8df205f-c97b-4c54-8310-c14606b6a4c8.\n" |
| 334 | + ] |
| 335 | + } |
| 336 | + ], |
| 337 | + "source": [ |
| 338 | + "I = pzmm.ImportModel()\n", |
| 339 | + "I.pzmmImportModel(zipFolder, modelPrefix, 'MLFlowTest', inputsDict, None, '{}.predict({})', metrics=['tensor'], force=True)" |
| 340 | + ] |
| 341 | + }, |
| 342 | + { |
| 343 | + "cell_type": "markdown", |
| 344 | + "id": "sealed-bryan", |
| 345 | + "metadata": {}, |
| 346 | + "source": [ |
| 347 | + "Success! Now we can view our model score code, pickle file, and metadata within Model Manager. \n", |
| 348 | + "***" |
| 349 | + ] |
| 350 | + } |
| 351 | + ], |
| 352 | + "metadata": { |
| 353 | + "kernelspec": { |
| 354 | + "display_name": "Python 3", |
| 355 | + "language": "python", |
| 356 | + "name": "python3" |
| 357 | + }, |
| 358 | + "language_info": { |
| 359 | + "codemirror_mode": { |
| 360 | + "name": "ipython", |
| 361 | + "version": 3 |
| 362 | + }, |
| 363 | + "file_extension": ".py", |
| 364 | + "mimetype": "text/x-python", |
| 365 | + "name": "python", |
| 366 | + "nbconvert_exporter": "python", |
| 367 | + "pygments_lexer": "ipython3", |
| 368 | + "version": "3.9.0" |
| 369 | + } |
| 370 | + }, |
| 371 | + "nbformat": 4, |
| 372 | + "nbformat_minor": 5 |
| 373 | +} |
0 commit comments