From 7cc999dc73be8f4e6132aada93dd07e8fb8ffff0 Mon Sep 17 00:00:00 2001 From: "Benjamin T. Vincent" Date: Tue, 4 Nov 2025 13:27:10 +0000 Subject: [PATCH 1/6] initial stab at describing lift test implementation details --- docs/source/notebooks/mmm/mmm_lift_test.ipynb | 157 +++++++++++++++++- 1 file changed, 156 insertions(+), 1 deletion(-) diff --git a/docs/source/notebooks/mmm/mmm_lift_test.ipynb b/docs/source/notebooks/mmm/mmm_lift_test.ipynb index 45bda321f..9792bb31b 100644 --- a/docs/source/notebooks/mmm/mmm_lift_test.ipynb +++ b/docs/source/notebooks/mmm/mmm_lift_test.ipynb @@ -1724,6 +1724,161 @@ "* $\\sigma$ is the standard deviation of the increase in $\\Delta_y$ of lift test. That is, we have uncertainty in the result of the lift test, and $\\sigma$ represents the standard deviation of this uncertainty. " ] }, + { + "cell_type": "markdown", + "id": "ad08b6c0", + "metadata": {}, + "source": [ + ":::{admonition} How Lift Tests Are Implemented in PyMC\n", + ":class: tip\n", + "\n", + "While the `add_lift_test_measurements` method handles all of this automatically in pymc-marketing, understanding the implementation mechanics is valuable. This section provides a step-by-step guide to how lift tests are integrated into the model.\n", + "\n", + "#### Why the Gamma Distribution?\n", + "\n", + "The choice of a Gamma distribution for the lift test likelihood is deliberate and mathematically motivated:\n", + "\n", + "- **Saturation curves are monotonically increasing**: Given a fixed set of parameters, as spend increases, contribution increases (or stays the same, but never decreases). This is a fundamental property of saturation functions used in MMM.\n", + "- **Lifts must be non-negative**: The change in contribution from a change in spend must be non-negative (when taking absolute values).\n", + "- **Gamma is natural for positive quantities**: The Gamma distribution is defined only for positive real numbers, making it a natural choice for modeling non-negative lift values.\n", + "- **Handling spend decreases**: By taking absolute values of both the observed and predicted lift ($|\\Delta_y|$ and $|\\tilde{\\text{lift}}|$), we can handle both increases and decreases in spend. A decrease in spend from $x$ to $x - \\delta$ produces the same (absolute) lift magnitude as an increase from $x - \\delta$ to $x$.\n", + "\n", + "While one could alternatively use a Normal distribution with absolute values, the Gamma distribution is more theoretically appropriate for strictly positive quantities and often provides better numerical stability in practice.\n", + "\n", + "#### Implementation Algorithm\n", + "\n", + "The following steps outline how lift test observations are added to a PyMC model:\n", + "\n", + "**Step 1: Scale the Lift Test Data**\n", + "\n", + "MMMs typically work with scaled/normalized data internally (e.g., using max scaling where each variable is divided by its maximum value). Lift tests, however, are specified in original units. Therefore, the first step is to transform the lift test measurements to match the model's internal scale:\n", + "\n", + "```python\n", + "# Pseudo-code\n", + "# Scale channel-related measurements (x, delta_x)\n", + "x_scaled = channel_scaler.transform(x)\n", + "delta_x_scaled = channel_scaler.transform(x + delta_x) - x_scaled\n", + "\n", + "# Scale target-related measurements (delta_y, sigma) \n", + "delta_y_scaled = target_scaler.transform(delta_y)\n", + "sigma_scaled = target_scaler.transform(sigma)\n", + "```\n", + "\n", + "This ensures that the lift test observations are compatible with the model's internal representation of channels and target.\n", + "\n", + "**Step 2: Map DataFrame Coordinates to Model Indices**\n", + "\n", + "Each row in the lift test DataFrame corresponds to specific coordinates in the model (e.g., a particular channel, or a channel-geo combination for multi-dimensional models). We need to map these coordinate values to their integer indices in the model:\n", + "\n", + "```python\n", + "# Example: if df has channel values [\"channel_1\", \"channel_2\", \"channel_1\"]\n", + "# and the model coords are {\"channel\": [\"channel_1\", \"channel_2\", \"channel_3\"]}\n", + "# then we map to indices: [0, 1, 0]\n", + "\n", + "indices = {}\n", + "for dim in required_dims: # e.g., [\"channel\"] or [\"channel\", \"geo\"]\n", + " lift_values = df_lift_test[dim].values\n", + " model_coords = model.coords[dim]\n", + " # Find index of each lift test value in model coordinates\n", + " indices[dim] = [model_coords.index(val) for val in lift_values]\n", + "```\n", + "\n", + "This coordinate mapping is essential for extracting the correct parameter values for each lift test.\n", + "\n", + "**Step 3: Extract Parameter Values for Each Lift Test**\n", + "\n", + "For each lift test, we need to extract the specific parameter values that apply to that test's coordinates. For example, if testing \"channel_1\", we need `saturation_lam[0]` and `saturation_beta[0]`:\n", + "\n", + "```python\n", + "# Create an indexer function that extracts parameters at specific coordinates\n", + "def get_parameter_at_indices(param_name):\n", + " param = model[param_name] # e.g., saturation_lam with dims=(\"channel\",)\n", + " dims = model.named_vars_to_dims[param_name] # e.g., (\"channel\",)\n", + " \n", + " # Index into parameter using the coordinate indices\n", + " idx = tuple([indices[dim] for dim in dims])\n", + " return param[idx]\n", + "\n", + "# Example: for lift tests on channels [0, 1, 0]\n", + "# get_parameter_at_indices(\"saturation_lam\") returns [lam[0], lam[1], lam[0]]\n", + "```\n", + "\n", + "**Step 4: Evaluate the Saturation Curves**\n", + "\n", + "Now we compute the model's prediction for each lift test by evaluating the saturation function at two points: before and after the spend change:\n", + "\n", + "```python\n", + "# Convert lift test data to PyTensor tensors\n", + "x_before = pt.as_tensor_variable(df_lift_test[\"x_scaled\"])\n", + "x_after = x_before + pt.as_tensor_variable(df_lift_test[\"delta_x_scaled\"])\n", + "\n", + "# Define saturation curve evaluation with extracted parameters\n", + "def saturation_curve(x):\n", + " return saturation_function(\n", + " x,\n", + " lam=get_parameter_at_indices(\"saturation_lam\"),\n", + " beta=get_parameter_at_indices(\"saturation_beta\"),\n", + " )\n", + "\n", + "# Compute model-estimated lift: the key computation\n", + "model_estimated_lift = saturation_curve(x_after) - saturation_curve(x_before)\n", + "```\n", + "\n", + "This `model_estimated_lift` is a PyTensor expression that depends on the model parameters, so it becomes part of the computational graph and will be different for each MCMC sample.\n", + "\n", + "**Step 5: Add the Likelihood Term**\n", + "\n", + "Finally, we add a Gamma observation to the PyMC model that links the model's predicted lift to the empirically observed lift:\n", + "\n", + "```python\n", + "with model:\n", + " pm.Gamma(\n", + " name=\"lift_measurements\",\n", + " mu=pt.abs(model_estimated_lift), # model's prediction\n", + " sigma=df_lift_test[\"sigma_scaled\"], # measurement uncertainty\n", + " observed=pt.abs(df_lift_test[\"delta_y_scaled\"]), # observed data\n", + " )\n", + "```\n", + "\n", + "This creates an additional likelihood term in the model. During MCMC sampling, the sampler will try to find parameter values that not only fit the main time series data but also produce saturation curves consistent with the lift test observations.\n", + "\n", + "#### Complete Example Flow\n", + "\n", + "Here's how all the steps work together for a simple example:\n", + "\n", + "```python\n", + "# Given: A model with saturation_lam and saturation_beta parameters (dims=\"channel\")\n", + "# Given: Lift test data for \"channel_1\" with x=0.5, delta_x=0.1, delta_y=0.15, sigma=0.02\n", + "\n", + "# Step 1: Scale data (assuming max scaling with max_spend=100, max_target=1000)\n", + "x_scaled = 0.5 / 100 # = 0.005\n", + "delta_x_scaled = 0.1 / 100 # = 0.001 \n", + "delta_y_scaled = 0.15 / 1000 # = 0.00015\n", + "sigma_scaled = 0.02 / 1000 # = 0.00002\n", + "\n", + "# Step 2: Map coordinates\n", + "# \"channel_1\" -> index 0 in model.coords[\"channel\"]\n", + "\n", + "# Step 3 & 4: Evaluate saturation curves\n", + "# Extract: lam_0 = saturation_lam[0], beta_0 = saturation_beta[0]\n", + "# Compute: lift = beta_0 * logistic(x_scaled + delta_x_scaled, lam_0) \n", + "# - beta_0 * logistic(x_scaled, lam_0)\n", + "\n", + "# Step 5: Add likelihood\n", + "pm.Gamma(mu=|lift|, sigma=sigma_scaled, observed=delta_y_scaled)\n", + "```\n", + "\n", + "Lift tests act as **additional observations** of the saturation curve that constrain the model parameters during inference. Instead of only learning from the historical time series (which may have limited variation or correlated channels), the model also learns from controlled experiments that directly probe the saturation curve at specific points.\n", + "\n", + "This is particularly valuable when:\n", + "- Channels are highly correlated in historical data (as in this notebook's example)\n", + "- You want to validate that the model's saturation curves match real-world behavior\n", + "- Historical data has limited variation in spend levels for certain channels\n", + "\n", + "By adding lift test measurements, you're essentially saying: \"I know that at spend level $x$ for this channel, a change of $\\Delta x$ produces a contribution change of approximately $\\Delta y$.\" This directly informs the saturation curve parameters and helps the model distinguish between otherwise confounded effects.\n", + ":::\n" + ] + }, { "cell_type": "markdown", "id": "52945ee4", @@ -3386,7 +3541,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.11" + "version": "3.12.12" } }, "nbformat": 4, From 34b2c8c61f2a7f063e6b825919e9b3e79fc68d4f Mon Sep 17 00:00:00 2001 From: "Benjamin T. Vincent" Date: Tue, 4 Nov 2025 13:44:05 +0000 Subject: [PATCH 2/6] text improvements --- docs/source/notebooks/mmm/mmm_lift_test.ipynb | 30 +++---------------- 1 file changed, 4 insertions(+), 26 deletions(-) diff --git a/docs/source/notebooks/mmm/mmm_lift_test.ipynb b/docs/source/notebooks/mmm/mmm_lift_test.ipynb index 9792bb31b..a8afaaa76 100644 --- a/docs/source/notebooks/mmm/mmm_lift_test.ipynb +++ b/docs/source/notebooks/mmm/mmm_lift_test.ipynb @@ -1768,7 +1768,7 @@ "\n", "**Step 2: Map DataFrame Coordinates to Model Indices**\n", "\n", - "Each row in the lift test DataFrame corresponds to specific coordinates in the model (e.g., a particular channel, or a channel-geo combination for multi-dimensional models). We need to map these coordinate values to their integer indices in the model:\n", + "Each row in the lift test DataFrame corresponds to specific coordinates in the model. We need to map these coordinate values to their integer indices in the model:\n", "\n", "```python\n", "# Example: if df has channel values [\"channel_1\", \"channel_2\", \"channel_1\"]\n", @@ -1785,6 +1785,8 @@ "\n", "This coordinate mapping is essential for extracting the correct parameter values for each lift test.\n", "\n", + "**Note**: For a simple national-level MMM (with only a \"channel\" dimension), this step is straightforward—you're just mapping channel names to indices. The complexity of this step increases for multi-dimensional models (e.g., channel × geo × product) where each lift test must be mapped across multiple coordinate dimensions simultaneously.\n", + "\n", "**Step 3: Extract Parameter Values for Each Lift Test**\n", "\n", "For each lift test, we need to extract the specific parameter values that apply to that test's coordinates. For example, if testing \"channel_1\", we need `saturation_lam[0]` and `saturation_beta[0]`:\n", @@ -1842,31 +1844,7 @@ "\n", "This creates an additional likelihood term in the model. During MCMC sampling, the sampler will try to find parameter values that not only fit the main time series data but also produce saturation curves consistent with the lift test observations.\n", "\n", - "#### Complete Example Flow\n", - "\n", - "Here's how all the steps work together for a simple example:\n", - "\n", - "```python\n", - "# Given: A model with saturation_lam and saturation_beta parameters (dims=\"channel\")\n", - "# Given: Lift test data for \"channel_1\" with x=0.5, delta_x=0.1, delta_y=0.15, sigma=0.02\n", - "\n", - "# Step 1: Scale data (assuming max scaling with max_spend=100, max_target=1000)\n", - "x_scaled = 0.5 / 100 # = 0.005\n", - "delta_x_scaled = 0.1 / 100 # = 0.001 \n", - "delta_y_scaled = 0.15 / 1000 # = 0.00015\n", - "sigma_scaled = 0.02 / 1000 # = 0.00002\n", - "\n", - "# Step 2: Map coordinates\n", - "# \"channel_1\" -> index 0 in model.coords[\"channel\"]\n", - "\n", - "# Step 3 & 4: Evaluate saturation curves\n", - "# Extract: lam_0 = saturation_lam[0], beta_0 = saturation_beta[0]\n", - "# Compute: lift = beta_0 * logistic(x_scaled + delta_x_scaled, lam_0) \n", - "# - beta_0 * logistic(x_scaled, lam_0)\n", - "\n", - "# Step 5: Add likelihood\n", - "pm.Gamma(mu=|lift|, sigma=sigma_scaled, observed=delta_y_scaled)\n", - "```\n", + "#### Summary\n", "\n", "Lift tests act as **additional observations** of the saturation curve that constrain the model parameters during inference. Instead of only learning from the historical time series (which may have limited variation or correlated channels), the model also learns from controlled experiments that directly probe the saturation curve at specific points.\n", "\n", From 38862d0f5f678d719f3ea35d4ddfc96b44969767 Mon Sep 17 00:00:00 2001 From: "Benjamin T. Vincent" Date: Tue, 4 Nov 2025 15:16:21 +0000 Subject: [PATCH 3/6] Add in link to lift test notebook in CausalPy --- docs/source/notebooks/mmm/mmm_lift_test.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/notebooks/mmm/mmm_lift_test.ipynb b/docs/source/notebooks/mmm/mmm_lift_test.ipynb index a8afaaa76..d295dbb2b 100644 --- a/docs/source/notebooks/mmm/mmm_lift_test.ipynb +++ b/docs/source/notebooks/mmm/mmm_lift_test.ipynb @@ -31,7 +31,7 @@ "\n", "## Requirements\n", "\n", - "Today, we won't be discussing how to conduct lift tests, but instead, we will focus on their utilization. If you wish to acquire knowledge on how to generate results that are compatible with your MMM models, you can check out [CausalPy](https://causalpy.readthedocs.io/en/latest/) for conducting experiments.\n", + "Today, we won't be discussing how to conduct lift tests, but instead, we will focus on their utilization. If you wish to acquire knowledge on how to generate results that are compatible with your MMM models, you can check out [CausalPy](https://causalpy.readthedocs.io/en/latest/) for conducting experiments, such as using [Interrupted Time Series for lift tests with no control groups](https://causalpy.readthedocs.io/en/latest/notebooks/its_lift_test.html).\n", "\n", "## Goal\n", "\n", From 4531055b959d0ba688cea9ae0ae293b8ff003459 Mon Sep 17 00:00:00 2001 From: "Benjamin T. Vincent" Date: Tue, 4 Nov 2025 15:24:06 +0000 Subject: [PATCH 4/6] convert admonition to a dropdown (closed by default, I believe) --- docs/source/notebooks/mmm/mmm_lift_test.ipynb | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/source/notebooks/mmm/mmm_lift_test.ipynb b/docs/source/notebooks/mmm/mmm_lift_test.ipynb index d295dbb2b..7f75a8124 100644 --- a/docs/source/notebooks/mmm/mmm_lift_test.ipynb +++ b/docs/source/notebooks/mmm/mmm_lift_test.ipynb @@ -1729,8 +1729,7 @@ "id": "ad08b6c0", "metadata": {}, "source": [ - ":::{admonition} How Lift Tests Are Implemented in PyMC\n", - ":class: tip\n", + ":::{dropdown} How Lift Tests Are Implemented in PyMC\n", "\n", "While the `add_lift_test_measurements` method handles all of this automatically in pymc-marketing, understanding the implementation mechanics is valuable. This section provides a step-by-step guide to how lift tests are integrated into the model.\n", "\n", From f74794a9bc94b8565ef3ad6bdca5c0be86a51e6d Mon Sep 17 00:00:00 2001 From: "Benjamin T. Vincent" Date: Tue, 4 Nov 2025 15:45:48 +0000 Subject: [PATCH 5/6] hide-output cell tag for something not related to what I've changed but looked crap --- docs/source/notebooks/mmm/mmm_lift_test.ipynb | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/docs/source/notebooks/mmm/mmm_lift_test.ipynb b/docs/source/notebooks/mmm/mmm_lift_test.ipynb index 7f75a8124..acc96392f 100644 --- a/docs/source/notebooks/mmm/mmm_lift_test.ipynb +++ b/docs/source/notebooks/mmm/mmm_lift_test.ipynb @@ -1868,7 +1868,11 @@ "cell_type": "code", "execution_count": 27, "id": "611feff6", - "metadata": {}, + "metadata": { + "tags": [ + "hide-output" + ] + }, "outputs": [ { "data": { From 26cf2df6c658652155294ffa6e0e153f34020610 Mon Sep 17 00:00:00 2001 From: "Benjamin T. Vincent" Date: Thu, 13 Nov 2025 14:02:45 +0000 Subject: [PATCH 6/6] add clarification that users do not need to manually scale data --- docs/source/notebooks/mmm/mmm_lift_test.ipynb | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/source/notebooks/mmm/mmm_lift_test.ipynb b/docs/source/notebooks/mmm/mmm_lift_test.ipynb index acc96392f..03b524d7f 100644 --- a/docs/source/notebooks/mmm/mmm_lift_test.ipynb +++ b/docs/source/notebooks/mmm/mmm_lift_test.ipynb @@ -1765,6 +1765,8 @@ "\n", "This ensures that the lift test observations are compatible with the model's internal representation of channels and target.\n", "\n", + "_NOTE: This is done internally, users do not implement this step._\n", + "\n", "**Step 2: Map DataFrame Coordinates to Model Indices**\n", "\n", "Each row in the lift test DataFrame corresponds to specific coordinates in the model. We need to map these coordinate values to their integer indices in the model:\n",