-
Notifications
You must be signed in to change notification settings - Fork 324
Refactor and enhance SensitivityAnalysis for MMM #1899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Expanded SensitivityAnalysis to support arbitrary PyMC models, improved documentation with usage examples, and added flexible sweep and filtering logic. The class now works directly with PyMC models and InferenceData, supports multi-dimensional inputs, and provides Jacobian-based marginal effects with optional xarray output and idata extension. Legacy code using pandas was removed in favor of a more general and robust approach.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1899 +/- ##
==========================================
- Coverage 93.54% 93.14% -0.41%
==========================================
Files 67 67
Lines 8966 9126 +160
==========================================
+ Hits 8387 8500 +113
- Misses 579 626 +47 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
List of variable names to intervene on. | ||
varinput : str | ||
Name of the pm.Data variable (e.g., "channel_data"). | ||
Expected shape: (date, *dims, channel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change here, should say: (date, *dims, arbitrary_dim)
that match varinput
dims.
) | ||
if extend_idata: | ||
self._add_to_idata(xr_result) | ||
return stacked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we should return either the xr result or none. I'll change this.
coords["sweep"] = np.asarray(sweep_values) | ||
|
||
# Map remaining dims using model coords and filters | ||
# Compute axis offsets: after (sample, sweep), axes align with dims_order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to dislike combined=True
and sample
instead of the usual ('chain', 'draw')
because at times there are issues down the line if you want to merge xarrays.
I need @drbenvincent help to understand how to reformulate the plots. |
Thanks @ErikRingen for the feedback, @drbenvincent I just need you to adjust the test and follow a bit more what do we want to show, but that should be all. |
…//github.com/pymc-labs/pymc-marketing into cetagostini/making_marginal_fn_pytensor_base
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just bear in mind that I am very very beginner when it comes to pytensor.
Here's a question - originally raised by @ErikRingen - but that I'm picking up on as well. As far as I can tell, this is going straight for the marginal effect with the Jacobian right? The original functionality was first computing what the outcome is for different sweep values then computed the gradient as a function of the sweep values (the marginal effect).
In discussion with an LLM, we need something like this?
# Add this to get actual values
resp_fn = function(inputs=[], outputs=resp_graph)
# Then in the sweep loop:
for s in sweep_values:
data_shared.set_value(original_X * s)
actual_values = resp_fn() # Actual outcome values
marginal_effects = jac_fn() # Marginal effects (current implementation)
My only other thought (at this time) is that there seems to be some assumptions about dimensions etc. We don't have the useful named dimensions that we do in xarray, so perhaps we need some extra validation here?
Actually, another thought. This seems very general. So in theory this could feasibly be ported to pymc-extras
or pymc
to enable generic sensitivity / marginal effects for arbitrary models? Not saying we should do that now, but it's perhaps worth considering, and thinking about the feasibility?
After talking with @drbenvincent , I found out I'm doing fundamentally different things, so, I'll be restructuring all the code 😃 |
Updated sensitivity analysis logic in pymc_marketing/mmm/sensitivity_analysis.py and related plotting in plot.py. Adjusted notebook outputs and execution counts for reproducibility. Modified tests to align with new sensitivity analysis behavior.
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
We are ready for review, current style makes the same than old with some new changes. |
@drbenvincent @ErikRingen changes applied. Minimal implementation will be: # sweeps = np.linspace(0.1, 2.0, 100)
mmm.sensitivity.run_sweep(
sweep_values=sweeps,
varinput="channel_data",
var_names="channel_contribution",
); You can set a mask if you want, over the channels you want to run. mmm.sensitivity.run_sweep(
sweep_values=sweeps,
varinput="channel_data",
var_names="channel_contribution",
response_mask=mask # avoid dims in the mask which are false.
); To set a % of the posterior, now we have mmm.sensitivity.run_sweep(
sweep_values=sweeps,
varinput="channel_data",
var_names="channel_contribution",
response_mask=mask # avoid dims in the mask which are false.
posterior_sample_percentage=0.2 # 20% of the posterior will be use
);
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you're itching to get going with this. So a quick review while I have a moment, focussing on the notebook, not the code yet.
mmm.plot.marginal_curve
should have a clear "marginal effects" title. Ideally we go back to having the more descriptive y axis label as well (marginal effect, with optional equation). I also made those more distinct from each other by having different default colours,C0
for the effects,C1
for marginal effects.- Can we also get the more descriptive x-axis labels on there. No an absolute must, but I think it's nice to get labels telling you if you are making a multiplicative or additive or whatever change.
- The other kwargs have underscores, so maybe add one to
varinput
? - Given the slight change in API, could you add in a paragraph just before or after the first call to
run_sweep
that explains the function call a bit. And ideally point people to the API docs page for more info. - Also on the changed API... the first example is about doing a sweep on influencer spend. The initial API had
var_names=["influencer_spend"]
which is pretty nice and obvious. The new API hasvarinput="channel_data", var_names="channel_contribution"
. So can you add some description of this. (Same as what I asked above I think). - The code comment this compute deterministic not using aleatoric uncertainty from likelihood needs some explanation. Maybe add a callout box (or just a paragraph) to explain what you mean by this.
- I welcome the change of variable name to
posterior_sample_percentage
, but being pedantic, this is a fraction not a percentage. Can we either change the variable name, or change the code to expect percentages rather than fractions. - Also please add an explanation of that argument in the text when it first appears.
- The example under the heading "Sensitivity analysis on the free shipping threshold" has been updated to also show plots of a sweep on
t
. Can you update the heading and text to explain a bit more. For example, what ist
? - Remove commented out code and what I think are redundant code comments
@drbenvincent if you are on during the weekend, I apply your changes. Would be amazing to get the stamp here and in order to get it for the release, still a great improve in the function! |
Description
Expanded SensitivityAnalysis to support arbitrary PyMC models, improved documentation with usage examples, and added flexible sweep and filtering logic. The class now works directly with PyMC models and InferenceData, supports multi-dimensional inputs, and provides Jacobian-based marginal effects with optional xarray output and idata extension.
Legacy code using pandas was removed in favor of a more general and robust approach.
ref code: https://colab.research.google.com/drive/16xn_lGYflxcyKZilSEGr0A6ShJuKSStK#scrollTo=1cbXjzqurd1c
to-do's:
new features:
Related Issue
Checklist
pre-commit.ci autofix
to auto-fix.