Update notebook

ruivieira · ruivieira · commit b0ddc5d73bf2 · 2021-07-05T14:37:41.000+01:00
diff --git a/notebooks/Counterfactuals.ipynb b/notebooks/Counterfactuals.ipynb
@@ -32,7 +32,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "512462ee",
+   "id": "12645d02",
    "metadata": {},
    "source": [
     "## Simple example\n",
@@ -60,7 +60,7 @@
   {
    "cell_type": "code",
    "execution_count": 4,
-   "id": "e4f89877",
+   "id": "22ba9951",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -74,7 +74,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f0bb1cc2",
+   "id": "b80d0d68",
    "metadata": {},
    "source": [
     "Next we need to define a **goal**.\n",
@@ -100,35 +100,92 @@
     "goal = [Output(\"inside\", Type.BOOLEAN, Value(True), 0.0)]"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "4e7fb934",
+   "metadata": {},
+   "source": [
+    "We will now define our initial features, $\\mathbf{x}$. Each feature can be instantiated by using `FeatureFactory` and in this case we want to use numerical features, so we'll use `FeatureFactory.newNumericalFeature`."
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 11,
    "id": "6aa524ae",
    "metadata": {},
    "outputs": [],
    "source": [
     "import random\n",
     "from trustyai.model import FeatureFactory\n",
     "\n",
-    "features = [FeatureFactory.newNumericalFeature(f\"f-num{i+1}\", random.random()*10.0) for i in range(4)]\n",
-    "\n",
+    "features = [FeatureFactory.newNumericalFeature(f\"f-num{i+1}\", random.random()*10.0) for i in range(4)]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "db9c90ff",
+   "metadata": {},
+   "source": [
+    "As we can see, the sum of of the features will not be within $\\epsilon$ (1.0) of $\\mathbf{C}$ (500.0). As such the model prediction will be `false`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "f0f07043",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Feature f-num1 has value 9.344140417436046\n",
+      "Feature f-num2 has value 2.101222990524685\n",
+      "Feature f-num3 has value 5.759573701749472\n",
+      "Feature f-num4 has value 0.8173260627331469\n",
+      "\n",
+      "Features sum is 18.02226317244335\n"
+     ]
+    }
+   ],
+   "source": [
+    "feature_sum = 0.0\n",
     "for f in features:\n",
-    "    print(f\"Feature {f.getName()} has value {f.getValue()}\")"
+    "    value = f.getValue().asNumber()\n",
+    "    print(f\"Feature {f.getName()} has value {value}\")\n",
+    "    feature_sum += value\n",
+    "print(f\"\\nFeatures sum is {feature_sum}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4773e71a",
+   "metadata": {},
+   "source": [
+    "The next step is to specify the **constraints** of the features, i.e. which features can be changed and which should be fixed. Since we want all features to be able to change, we specify `False` for all of them:"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 20,
    "id": "513d2e5a",
    "metadata": {},
    "outputs": [],
    "source": [
     "constraints = [False] * 4"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "1894c1d7",
+   "metadata": {},
+   "source": [
+    "Finally, we also specify which are the **bounds** for the counterfactual search. Typically this can be set either using domain-specific knowledge or taken from the data. In this case we simply specify an arbitrary (sensible) value, e.g. all the features can vary between `0` and `1000`."
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 13,
    "id": "30dcc15b",
    "metadata": {},
    "outputs": [],
@@ -139,43 +196,38 @@
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "5047e075",
+   "cell_type": "markdown",
+   "id": "be0cdfe3",
    "metadata": {},
-   "outputs": [],
    "source": [
-    "from trustyai.model import DataDomain\n",
-    "\n",
-    "data_domain = DataDomain(feature_boundaries)"
+    "In order to use the boundaries in the explainer we need to wrap all of them in a `DataDomain` class:"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "id": "e1b0da83",
+   "execution_count": 14,
+   "id": "9cfe2a9d",
    "metadata": {},
    "outputs": [],
    "source": [
-    "center = 500.0\n",
-    "epsilon = 10.0"
+    "from trustyai.model import DataDomain\n",
+    "\n",
+    "data_domain = DataDomain(feature_boundaries)"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "510b3b16",
+   "cell_type": "markdown",
+   "id": "e47d348e",
    "metadata": {},
-   "outputs": [],
    "source": [
-    "from trustyai.utils import TestUtils\n",
+    "We can now instantiate the **explainer** itself.\n",
     "\n",
-    "model = TestUtils.getSumThresholdModel(center, epsilon)"
+    "To do so, we will to configure the termination criteria. For this example we will specify that the counterfactual search should only execute a maximum of 10,000 iterations before stopping and returning whatever the best result is so far."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 15,
    "id": "bcd25df0",
    "metadata": {},
    "outputs": [],
@@ -193,80 +245,157 @@
     "    )"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "790e868f",
+   "metadata": {},
+   "source": [
+    "We can can now instantiate the explainer itself using `CounterfactualExplainer` and our `solver_config` configuration."
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 16,
    "id": "c2b76274",
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "SLF4J: Failed to load class \"org.slf4j.impl.StaticLoggerBinder\".\n",
+      "SLF4J: Defaulting to no-operation (NOP) logger implementation\n",
+      "SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.\n"
+     ]
+    }
+   ],
    "source": [
     "from org.kie.kogito.explainability.local.counterfactual import CounterfactualExplainer\n",
     "\n",
     "explainer = CounterfactualExplainer.builder().withSolverConfig(solver_config).build()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "292c136c",
+   "metadata": {},
+   "source": [
+    "We will now express the counterfactual problem as defined above.\n",
+    "\n",
+    "- `original` represents our $\\mathbf{x}$ which know gives a prediction of `False`\n",
+    "- `goals` represents our $\\mathbf{y'}$, that is our desired prediction (`True`)\n",
+    "- `domain` repreents the boundaries for the counterfactual search"
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "id": "4cff79cd",
+   "execution_count": 17,
+   "id": "92356f76",
    "metadata": {},
    "outputs": [],
    "source": [
     "from trustyai.model import PredictionFeatureDomain, PredictionInput, PredictionOutput\n",
     "\n",
-    "inputs = PredictionInput(features)\n",
-    "outputs = PredictionOutput(goal)\n",
+    "original = PredictionInput(features)\n",
+    "goals = PredictionOutput(goal)\n",
     "domain = PredictionFeatureDomain(data_domain.getFeatureDomains())"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "00c09d95",
+   "metadata": {},
+   "source": [
+    "We wrap these quantities in a `CounterfactualPrediction` (the UUID is simply to label the search instance):"
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "id": "98057ebd",
+   "execution_count": 21,
+   "id": "19a001ac",
    "metadata": {},
    "outputs": [],
    "source": [
     "import uuid\n",
     "from trustyai.model import CounterfactualPrediction\n",
     "\n",
-    "prediction = CounterfactualPrediction(inputs, outputs, domain, constraints, None, uuid.uuid4())"
+    "prediction = CounterfactualPrediction(original, goals, domain, constraints, None, uuid.uuid4())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6d593f4f",
+   "metadata": {},
+   "source": [
+    "We now request the counterfactual $\\mathbf{x'}$ which is closest to $\\mathbf{x}$ and which satisfies $f(\\mathbf{x'}, \\epsilon, \\mathbf{C})=\\mathbf{y'}$:"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "id": "910a250f",
+   "execution_count": 22,
+   "id": "e5783b3d",
    "metadata": {},
    "outputs": [],
    "source": [
     "explanation_async = explainer.explainAsync(prediction, model)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "b2af6cb4",
+   "metadata": {},
+   "source": [
+    "The counterfactual explainer API operates in a asynchronous way, so we need to `.get()` the result:"
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "id": "38774822",
+   "execution_count": 23,
+   "id": "cc2ad21e",
    "metadata": {},
    "outputs": [],
    "source": [
     "explanation = explanation_async.get()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "7fcfb591",
+   "metadata": {},
+   "source": [
+    "We can see that the counterfactual $\\mathbf{x'}$"
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "id": "7cb95b8c",
+   "execution_count": 25,
+   "id": "6f1e04c1",
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "java.lang.DoubleFeature{value=490.4373902874999, intRangeMinimum=0.0, intRangeMaximum=1000.0, id='f-num1'}\n",
+      "java.lang.DoubleFeature{value=2.420079314517709, intRangeMinimum=0.0, intRangeMaximum=1000.0, id='f-num2'}\n",
+      "java.lang.DoubleFeature{value=5.759573701749472, intRangeMinimum=0.0, intRangeMaximum=1000.0, id='f-num3'}\n",
+      "java.lang.DoubleFeature{value=0.8173260627331469, intRangeMinimum=0.0, intRangeMaximum=1000.0, id='f-num4'}\n"
+     ]
+    }
+   ],
    "source": [
+    "feature_sum = 0.0\n",
     "for entity in explanation.getEntities():\n",
-    "    print(entity)"
+    "    print(entity)\n",
+    "    feature_sum += entity.getValue().asNumber()\n",
+    "    \n",
+    "print(f\"\\nFeature sum is {fe}\")"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "7a8587d1",
+   "id": "b49d9c1c",
    "metadata": {},
    "outputs": [],
    "source": []