FEAT: placebo testing for paid media channels#1274
FEAT: placebo testing for paid media channels#1274toj9 wants to merge 3 commits intofacebookexperimental:mainfrom
Conversation
|
Hi @toj9! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
|
Hey @gufengzhou! have been using Robyn for years and I love it! I have just put together this PR for a potential new feature I have been thinking about. Let me know what you think or if I need to tag somebody else from the community. Thanks, cheers. |
|
Hello @toj9, very cool idea, beautiful viz and thanks for the contribution! I'd like to get your thoughts on couple of things.
|
|
@gufengzhou ah yes, thanks for this feedback!!
I will say though that the idea for me from the start was that by shuffling we will break the hill with the adstock effect, meaning the model should no longer easily learn how each week’s spend for the channel carries over or follows a diminishing‐returns curve, even if total dollars stay the same the dose–response and carryover signals "vanish" and NRMSE inflates.
So, for an always-on channel: So actually the sequential combo could be a solution for the always-on is what I thought about after writing this lol. Both tests are for something different and would serve different stress-testing capability? What I am thinking is also that we can actually have a third inject a placebo option -- adding a paid variable spend -- which this could then help to sidestep the problem with the possible always-on? this would be essentially the same random noise as the shuffling but un practice, we would generate a truly random spend series (for example, sample from the same mean/SD of existing channels like search_S or ooh_S so it lives on the same scale) and then inject that column into the Robyn inputs. And because the hypothesis would be that this placebo has robyn's "spend_share" but should have 0% "effect_share" due to it being a placebo, we can check the following: Spend-Share Check -- in the one-pager output, the placebo variable will show up in the bar chart of robyns Spend Share (since it’s literally inserted as a paid channel), but when we look at its Effect Share it must be essentially zero -- if Robyn ever assigns it a nonzero effect, that means the optimizer is mistaking random noise for real signal and its showing overfitting: Saturation Curve Check -- its Hill/adstock curve should be flat with no significant shape or upward slope. If we see any curvature, shape or significant slope, it’s a red flag that the model is overfitting again. NRMSE Impact Check -- we compare Pareto NRMSE distributions with and without that injected placebo. Because it truly drives nothing, ideally the minimum NRMSE (across all candidate models) should stay the same or even improve slightly (if Robyn accidentally shrank away noise). If adding the placebo ever lowers NRMSE significantly, it means Robyn is “learning” from pure randomness and overfitting again. After injecting a placebo as a diagnostic step into what we believed was a strong model, its behavior tells us whether the original fit was genuine or just noise:
If Robyn “learns” from the noise variable, it’s demonstrating an overfitting tendency. If it does not, its proving that its hill/adstock machinery was not flexible enough to mistake random fluctuations for real signal. Here is an example where the EFFECT SHARE actually gone up to 2.9% for a placebo: And here is an example where the EFFECT SHARE went up to even 3.7%:
|




Project Robyn
Summary
Introduces a placebo test feature that, for any chosen paid media channel, shuffles its weekly spend, reruns Robyn, and compares the resulting NRMSE distribution against the original one. This code creates a “placebo test” (via
robyn_placebo()andplot_placebo()).Specifically, it:
Motivation
When we include media spend as an independent variable in an MMM, we are assuming -- under the null hypothesis (H₀) -- that it has some predictive influence on the response (e.g., sales, conversions). But in reality, we don’t know if that’s true. Some channels’ spend might genuinely help the model predict the outcome. Other channels might not add anything beyond what other predictors already cover (i.e., they are redundant or collinear). This placebo test challenges that assumption, and helps to stress test one paid media variable at a time:
Why “original” may be lower on average
Because the optimizer is discovering combinations of hyperparameters to minimize NRMSE, we expect those thousands of original fits to tend toward lower errors (they “learn” real signal from the un‐shuffled data). In contrast, if we scramble one channel that genuinely carried predictive information, the optimizer can’t recover those patterns as well as it could with it -- instead it’s fitting noise -- so its candidate fits should, on average, be worse (higher NRMSE).
When we might see the opposite
If we turn a paid channel into a placebo that has very little or no real predictive power on the response (as placebos should), then the optimizer might still use the remaining variables to get a similarly low error. In that case original distribution and placebo distribution end up roughly the same, or occasionally placebo distribution even dips slightly below original distribution just due to random chance in the stochastic search. But whenever a channel truly matters, placebo treatment should increase the average NRMSE (so we see a higher‐centered placebo distribution after re-runing the model with the shuffled media spend variable).
Variance as a supporting metric
In practice when using robyn, after implementing the placebo, a higher variance of NRMSE values may flag that the optimizer is “flailing” more when it can’t lean on a real driver because the optimizer "searches" and “hunts” harder when it no longer has that useful independent variable in the model. A one-tailed F-test then tells us whether that increase in spread is significant or not. If it is, we have got supporting extra evidence that the channel we shuffled may have carried a real additional predictive power - if the variance doesn’t rise much, it suggests the channel might not have contributed much signal to begin with.
Output Examples
The placebo test has 3 main outputs:
Placebo tests suggest OOH spend might be an important spend variable, as after shuffled the error distribution rises significantly
1st replication:

2nd replication:

3rd replication:

Placebo tests suggest FB media spend might not be important given other predictive variables in the model
These stress-testing charts are automatically saved into the same folder as other visuals when exported:

Code Example
What's next?
For example, the code could be adjusted to support other variables in the future, not just paid media. We could also introduce random synthetic confounders to see if the channel effect disappears, run a “subset refuter” that holds out part of the data to verify stability, or even remove certain variables (or time periods) entirely to confirm stability.
Type of change
feat: New feature (non-breaking change which adds functionality)
How Has This Been Tested?