Merge pull request #8 from ruivieira/shap-example

ruivieira · web-flow · commit 68e403a289c9 · 2022-04-11T21:23:27.000+01:00
Add SHAP notebook
diff --git a/examples/SHAP.ipynb b/examples/SHAP.ipynb
@@ -0,0 +1,331 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "d2a08ef6",
+   "metadata": {},
+   "source": [
+    "# SHAP explanations"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "767b003e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import trustyai\n",
+    "\n",
+    "trustyai.init()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e194eb56",
+   "metadata": {},
+   "source": [
+    "## Simple example\n",
+    "\n",
+    "We start by defining our black-box model, typically represented by\n",
+    "\n",
+    "$$\n",
+    "f(\\mathbf{x}) = \\mathbf{y}\n",
+    "$$\n",
+    "\n",
+    "Where $\\mathbf{x}=\\{x_1, x_2, \\dots,x_m\\}$ and $\\mathbf{y}=\\{y_1, y_2, \\dots,y_n\\}$.\n",
+    "\n",
+    "Our example toy model, in this case, takes an all-numerical input $\\mathbf{x}$ and return a $\\mathbf{y}$ of either `true` or `false` if the sum of the $\\mathbf{x}$ components is within a threshold $\\epsilon$ of a point $\\mathbf{C}$, that is:\n",
+    "\n",
+    "$$\n",
+    "f(\\mathbf{x}, \\epsilon, \\mathbf{C})=\\begin{cases}\n",
+    "\\text{true},\\qquad \\text{if}\\ \\mathbf{C}-\\epsilon<\\sum_{i=1}^m x_i <\\mathbf{C}+\\epsilon \\\\\n",
+    "\\text{false},\\qquad \\text{otherwise}\n",
+    "\\end{cases}\n",
+    "$$\n",
+    "\n",
+    "This model is provided in the `TestUtils` module. We instantiate with a $\\mathbf{C}=500$ and $\\epsilon=1.0$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "fd02e320",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from trustyai.utils import TestUtils\n",
+    "\n",
+    "center = 10.0\n",
+    "epsilon = 2.0\n",
+    "\n",
+    "model = TestUtils.getSumThresholdModel(center, epsilon)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e4a15f8b",
+   "metadata": {},
+   "source": [
+    "Next we need to define a **goal**.\n",
+    "If our model is $f(\\mathbf{x'})=\\mathbf{y'}$ we are then defining our $\\mathbf{y'}$ and the counterfactual result will be the $\\mathbf{x'}$ which satisfies $f(\\mathbf{x'})=\\mathbf{y'}$.\n",
+    "\n",
+    "We will define our goal as `true`, that is, the sum is withing the vicinity of a (to be defined) point $\\mathbf{C}$. The goal is a list of `Output` which take the following parameters\n",
+    "\n",
+    "- The feature name\n",
+    "- The feature type\n",
+    "- The feature value (wrapped in `Value`)\n",
+    "- A confidence threshold, which we will leave at zero (no threshold)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "bf3f4232",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from trustyai.model import output\n",
+    "\n",
+    "decision = \"inside\"\n",
+    "goal = [output(name=decision, dtype=\"bool\", value=True, score=0.0)]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "64349c3e",
+   "metadata": {},
+   "source": [
+    "We will now define our initial features, $\\mathbf{x}$. Each feature can be instantiated by using `FeatureFactory` and in this case we want to use numerical features, so we'll use `FeatureFactory.newNumericalFeature`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "d688a7c8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import random\n",
+    "from trustyai.model import feature\n",
+    "\n",
+    "features = [feature(name=f\"x{i+1}\", dtype=\"number\", value=random.random()*10.0) for i in range(3)]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a562ef68",
+   "metadata": {},
+   "source": [
+    "As we can see, the sum of of the features will not be within $\\epsilon$ (1.0) of $\\mathbf{C}$ (500.0). As such the model prediction will be `false`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "48212d3f",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Feature x1 has value 2.1516473114599046\n",
+      "Feature x2 has value 0.8137674993709809\n",
+      "Feature x3 has value 5.637541112355343\n",
+      "\n",
+      "Features sum is 8.60295592318623\n"
+     ]
+    }
+   ],
+   "source": [
+    "feature_sum = 0.0\n",
+    "for f in features:\n",
+    "    value = f.value.as_number()\n",
+    "    print(f\"Feature {f.name} has value {value}\")\n",
+    "    feature_sum += value\n",
+    "print(f\"\\nFeatures sum is {feature_sum}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "13001554",
+   "metadata": {},
+   "source": [
+    "We execute the model on the generated input and collect the output"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "0a45c0e0",
+   "metadata": {
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from org.kie.kogito.explainability.model import PredictionInput, PredictionOutput\n",
+    "\n",
+    "goals = model.predictAsync([PredictionInput(features)]).get()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "4483bf24",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "background = []\n",
+    "for i in range(10):\n",
+    "    _features = [feature(name=f\"x{i+1}\", dtype=\"number\", value=random.random()*10.0) for i in range(3)]\n",
+    "    background.append(PredictionInput(_features))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "324cefdf",
+   "metadata": {
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   },
+   "source": [
+    "We wrap these quantities in a `SimplePrediction`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "8bb2aac1",
+   "metadata": {
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from trustyai.model import simple_prediction\n",
+    "\n",
+    "prediction = simple_prediction(input_features=features, outputs=goals[0].outputs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9bb631f9",
+   "metadata": {
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   },
+   "source": [
+    "We can now instantiate the **explainer** itself.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "115fa89c",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "SLF4J: Failed to load class \"org.slf4j.impl.StaticLoggerBinder\".\n",
+      "SLF4J: Defaulting to no-operation (NOP) logger implementation\n",
+      "SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.\n"
+     ]
+    }
+   ],
+   "source": [
+    "from trustyai.explainers import SHAPExplainer\n",
+    "\n",
+    "explainer = SHAPExplainer(background=background)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7cd8b2b4",
+   "metadata": {
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   },
+   "source": [
+    "We generate the **explanation** as a _dict : decision --> saliency_.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "b34e26d7",
+   "metadata": {
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "explanation = explainer.explain(prediction, model)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d32e4272",
+   "metadata": {
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   },
+   "source": [
+    "We inspect the saliency scores assigned by LIME to each feature"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "2f0721fe",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Saliency{output=Output{value=true, type=boolean, score=-0.39704407681377063, name='inside'}, perFeatureImportance=[FeatureImportance{feature=Feature{name='x1', type=number, value=2.1516473114599046}, score=0.4, confidence= +/-0.39264863227014996}, FeatureImportance{feature=Feature{name='x2', type=number, value=0.8137674993709809}, score=0.35, confidence= +/-0.39264863227014996}, FeatureImportance{feature=Feature{name='x3', type=number, value=5.637541112355343}, score=0.15000000000000002, confidence= +/-0.5552890210036922}]}\n"
+     ]
+    }
+   ],
+   "source": [
+    "for saliency in explanation.getSaliencies():\n",
+    "    print(saliency)"
+   ]
+  }
+ ],
+ "metadata": {
+  "interpreter": {
+   "hash": "a0b19a0e0769482a3dd54d9b1f74632fb70b79784820162adf8976b9cad4acbb"
+  },
+  "kernelspec": {
+   "display_name": "trustyai-python",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/requirements.txt b/requirements.txt
@@ -1 +1 @@
-trustyai==0.1.0
+trustyai==0.1.1