|
17 | 17 | "\n",
|
18 | 18 | "> We know that the best way to make causal claims is to run a randomized control trial (sometimes known as an [A/B test](https://en.wikipedia.org/wiki/A/B_testing)). If we have randomly assigned stores across Europe (or picked a country) at random, then perhaps an A/B test would do the job. But we did not pick Denmark at random - so we are worried about confounding variables.\n",
|
19 | 19 | "\n",
|
20 |
| - "> But we heard about synthetic control methods and a thing called GeoLift. After some research, we decide this is exactly what we want to do. But we are particulatly interested in how certain we are in the level of any uplift we detect, so we want to use Bayesian methods and get easy to interpret Bayesian credible intervals. You find a library called `CausalPy` that supports exactly that use case and are delighted.\n", |
| 20 | + "> But we heard about synthetic control methods and a thing called GeoLift. After some research, we decide this is exactly what we want to do. But we are particularly interested in how certain we are in the level of any uplift we detect, so we want to use Bayesian methods and get easy to interpret Bayesian credible intervals. You find a library called `CausalPy` that supports exactly that use case and are delighted.\n", |
21 | 21 | "\n",
|
22 | 22 | "Let's go!"
|
23 | 23 | ]
|
|
248 | 248 | "\n",
|
249 | 249 | "In order to calculate what (if any) causal effect there is from the store refurbishment we need to compare the _actual_ sales in Denmark after the intervention and the _counterfactual_ sales in Denmark if the intervention had not taken place. We can see why this is called the counterfactual - we _did_ refurbish the stores in Denmark, so this is a completely hypothetical scenario that runs _counter to the facts_. But if we could measure (or more realistically estimate) this, that would be our control group. \n",
|
250 | 250 | "\n",
|
251 |
| - "In this case, we generate a synthetic control, which is the name of the technique we will be using to estimate our counterfactual sales data in Denmark if the refurbishment had not taken place. You can read more about the synthetic control method on the [synthetic control wikipedia page](https://en.wikipedia.org/wiki/Synthetic_control_method), but the basic idea is as follows.\n", |
252 |
| - "\n", |
253 |
| - "For those familiar with traditional (non-Bayesian) modelling methods, the basic synthetic control algorithm can be described like this:\n", |
| 251 | + "In this case, we generate a synthetic control, which is the name of the technique we will be using to estimate our counterfactual sales data in Denmark if the refurbishment had not taken place. You can read more about the synthetic control method on the [synthetic control wikipedia page](https://en.wikipedia.org/wiki/Synthetic_control_method), but the basic idea is as follows. For those familiar with traditional (non-Bayesian) modelling methods, the basic synthetic control algorithm can be described like this:\n", |
254 | 252 | "\n",
|
255 | 253 | "```python\n",
|
256 | 254 | "import my_custom_scikit_learn_model as weighted_combination\n",
|
|
563 | 561 | "cell_type": "markdown",
|
564 | 562 | "metadata": {},
|
565 | 563 | "source": [
|
566 |
| - "So at the end of our causal modelling endeavours we can report to our boss something along the lines of: \"We believe that the store refurbishment scheme was causally responsible for driving a total of about 9130 additional sales. But we have uncertainty in the exact number of additional sales - our 94% credible regions span 7,410 to 10,740\".\n", |
| 564 | + "So at the end of our causal modelling endeavours we can report to our boss something along the lines of: \"We believe that the store refurbishment scheme was causally responsible for driving a total of about 9140 additional sales. But we have uncertainty in the exact number of additional sales - our 94% credible regions span 7,570 to 10,870\".\n", |
567 | 565 | "\n",
|
568 | 566 | "There are of course caveats worth bearing in mind. The analysis we've conducted has assumed that the only major event that might selectively influence sales in Denmark was the store renovation project. If this is a reasonable assumption then we may be on relatively stable ground in making causal claims. But if there were other events which selectively effected some units (countries) and not others, then we may need to be much more cautious in our claims and resort to more in-depth modelling approaches.\n",
|
569 | 567 | "\n",
|
570 |
| - "But our estimated cumulative causal impact of $9330^{10740}_{7410}$ is exactly the information needed by others in the company. They can use this figure (and even the uncertainty associated with it) and estimate how long it would take for the cost of renovating other stores to result in a net profit.\n", |
| 568 | + "But our estimated cumulative causal impact of $9140^{10870}_{7570}$ is exactly the information needed by others in the company. They can use this figure (and even the uncertainty associated with it) and estimate how long it would take for the cost of renovating other stores to result in a net profit.\n", |
571 | 569 | "\n",
|
572 |
| - "You boss is very happy. You get a big end-of-year bonus." |
| 570 | + "Your boss is very happy. You get a big end-of-year bonus." |
573 | 571 | ]
|
574 | 572 | },
|
575 | 573 | {
|
|
0 commit comments