Skip to content

Improvements on multidimensional notebook #1876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Aug 18, 2025

Conversation

daniel-saunders-phil
Copy link
Contributor

@daniel-saunders-phil daniel-saunders-phil commented Aug 6, 2025

The current multi-dimensional notebook has a couple of issues:

This PR address those issues and offers a number of other improvements:

  • a new data set specific to the multi-dimensional class. The synthetic data was generated by the same MMM so that ensures at least decent performance.
  • Demos how to build a non-centered positive valued random variable. This is a useful trick for hierarchical MMMs because usually the channels are weakly informative and usually you need to use a non-centered distribution.
  • Removes the changepoint trend stuff. Change points are mixtures and mixtures can be quite hard to estimate. So I think it might be smarter to keep this notebook narrow and focus on hierarchies and dims.
  • Add a parameter recovery section to the notebook
  • Add a script to reproduce the synthetic data whenever we want and save off the true parameter values to use in the recovery section.

It's not finished yet because I'm still getting some pesky divergences. A bit unclear the cause atm.


📚 Documentation preview 📚: https://pymc-marketing--1876.org.readthedocs.build/en/1876/

…book, all the hierarchical parameters were removed in favour of fully pooled parameters over geographies. However the text still discusses hierarchies so I'm restoring them.
…between partial, full and no pooling models. We now have examples of all three so it's a bit more a friendly introduction.
…native prior_predictive method, I think it's better to call pymc's prior predictive method. The advatnage is that y_original_scale is easily recovered so it feels more consistent with the way we fetch y_original_scale later in the notebook. It is also confusing to go through the work of adding y_original_scale a few cells above and then make people fetch the scales all over again.
…example data exactly duplicated channels for each geography which would cause intense posterior correlations. New example data is generated from the multi-dimensional mmm. Also removed the change points - those are mixtures and are inheritly difficult to fit. I think we should keep this notebook more narrowly focused on dimensions and hierarchies.
…on because we'd need a zero sum normal and the prior class isn't design to interact with zero sum normals very well (I would like to be able to add two Prior classes together so we can have a seperate mean.)
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@daniel-saunders-phil daniel-saunders-phil marked this pull request as draft August 6, 2025 22:56
@github-actions github-actions bot added docs Improvements or additions to documentation MMM labels Aug 6, 2025
@daniel-saunders-phil daniel-saunders-phil changed the title Correct multi dimensional nb Improvements on multidimensional notebook Aug 7, 2025
@daniel-s-tccc
Copy link

Alright, I got the model to converge again. The optimizer outputs are bit messed up though. I'll have to figure that out next.

… to be appropriate to the scale of this dataset.
@daniel-saunders-phil daniel-saunders-phil marked this pull request as ready for review August 8, 2025 21:56
@daniel-saunders-phil
Copy link
Contributor Author

daniel-saunders-phil commented Aug 8, 2025

@cetagostini Hey mate, if you have a chance to review, that would be much appreciated.

I got a bit cavalier with ripping out features that were making it unduly difficult to fit the model so open to discussing those choices. But hey, it fits in 70 seconds now 😃

Also, I couldn't find a idata compression small enough nor is it synced with the yaml file anymore. Lemme know if that's a blocker and I'll try again.

@juanitorduz juanitorduz requested a review from cetagostini August 9, 2025 07:10
@juanitorduz juanitorduz added this to the 0.16.0 milestone Aug 9, 2025
Copy link

review-notebook-app bot commented Aug 9, 2025

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2025-08-09T07:13:37Z
----------------------------------------------------------------

Line #12.        extend_idata=True,

Shall we use nutpie?


Copy link
Contributor

@cetagostini cetagostini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the notebook a lot, thanks for all the help here mate, amazing contribution to make the notebook look better the small details, we can log them as issues and solve soon.

Respect to Nutpie, I'll say optional, and up to you if you want to change it 🙌🏻

Amazing @daniel-saunders-phil 🔥

@juanitorduz
Copy link
Collaborator

@daniel-saunders-phil @cetagostini I suggest we merge #1832 and then you can rebase and merge this one? @daniel-s-tccc do we need to also update the other notebooks that rely on this example (i.e. budget allocations ?)

@cetagostini
Copy link
Contributor

@juanitorduz you are right, that should be only to run the notebook as budget allocation, and allocation assessment.

Copy link
Contributor

@cetagostini cetagostini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets update the notebooks and continue but work have been amazing!

@daniel-s-tccc
Copy link

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2025-08-09T07:13:37Z ----------------------------------------------------------------

Line #12. extend_idata=True,
Shall we use nutpie?

Nutpie (numba) is broken on the MMM class right now. The error is pretty obscure so haven't had time to figure out why. I would prefer to use the default sampler because I don't want to point people toward nutpie until we resolve that.

Copy link

codecov bot commented Aug 11, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 52.47%. Comparing base (bca55e4) to head (9cc75f6).
⚠️ Report is 1 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (bca55e4) and HEAD (9cc75f6). Click for more details.

HEAD has 4 uploads less than BASE
Flag BASE (bca55e4) HEAD (9cc75f6)
23 19
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1876       +/-   ##
===========================================
- Coverage   91.86%   52.47%   -39.39%     
===========================================
  Files          64       64               
  Lines        7546     7546               
===========================================
- Hits         6932     3960     -2972     
- Misses        614     3586     +2972     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@daniel-s-tccc
Copy link

I'm working through the rebase and it's pretty time-consuming. @carlosagostini @juanitorduz

The trouble is that everything is so tightly coupled - the text in the optimizer notebook wants to call out specific behaviours that old might exhibited or relies on features of the old dataset. Plus the diffs on notebooks are challenging to parse so I cannot quite tell what I need to keep track of and what is new in Carlo's work.

What do we think decoupling the multi-dimensional mmm notebook from the optimizer notebooks? It would be easy to do - we just don't save the result of the multi-dimensional notebook on each run. Then we can use old idata, old data, old yaml to run the optimizer notebooks and their results would be preserved. It would help make the notebook suite a bit more modular - you can improve one notebook without having to rewrite all the others.

@cetagostini
Copy link
Contributor

@daniel-s-tccc I definitely see the benefit to separate notebooks, at the same time, I could be scare then if they are two detach we could bring issues. Ideally, our models in production can use all notebook functions.

I'll jump in your branch, and take a look to your issues and try to give you a hand!

Copy link

review-notebook-app bot commented Aug 12, 2025

View / edit / reply to this conversation on ReviewNB

cetagostini commented on 2025-08-12T08:53:24Z
----------------------------------------------------------------

Respect to previous one, channels estimations look quite similar. From storytelling perspective, I'm not sure about the new model, both channels are massively uncertainty, and we don't have clear winners. Previous model definitely was showing different adstocks, and preference to x1.

Could we use set better params for x1? and probably take signals from x2, so values are only in the lower range bringing uncertainty to it?


…act. Update notebook with code for parameter recovery.
@daniel-saunders-phil
Copy link
Contributor Author

daniel-saunders-phil commented Aug 14, 2025

Hey hey, I saw the notebook today and we are in track!

I think, if we add:

  • Parameter recovery
  • Recommendations to decrease the size of .nc file

We are good to go! 🔥

I agree, that's a good plan to finish it. Parameter recovery is hitting some bumps so I'm gonna keep digging to figure out what is happening.

@daniel-saunders-phil
Copy link
Contributor Author

Alright guys, I think we are good here :)

Copy link
Collaborator

@juanitorduz juanitorduz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comments on the script ;)

Copy link

review-notebook-app bot commented Aug 15, 2025

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2025-08-15T21:06:17Z
----------------------------------------------------------------

Can we remove the output ...

python

File "c:\Users\dsaun\miniforge3\envs\pymc-marketing-dev\Lib\importlib\__init__.py", line 128, in reload

raise ModuleNotFoundError(f"spec not found for the module {name!r}", name=name)

ModuleNotFoundError: spec not found for the module 'cutils_ext'


Copy link

review-notebook-app bot commented Aug 15, 2025

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2025-08-15T21:06:18Z
----------------------------------------------------------------

are we missing some bar plot on the lower right panel?


daniel-saunders-phil commented on 2025-08-15T21:58:27Z
----------------------------------------------------------------

The optimizer just didn't put any money there because it's the worst channel. I agree it looks weird so I pushed the budget up a bit.

@juanitorduz
Copy link
Collaborator

Thanks @daniel-saunders-phil :) I left some minor comments. I think there are some open comments from @carlosagostini (or are they resolved?), once we close them, let's merge this one! I like the new content a lot! Thanks again for making it much better 🙌

Copy link
Contributor Author

The optimizer just didn't put any money there because it's the worst channel. I agree it looks weird so I pushed the budget up a bit.


View entire conversation on ReviewNB

@daniel-saunders-phil
Copy link
Contributor Author

@juanitorduz tided up a few things. I think we are all clear on Carlos' suggestions - they mostly pertained to the optimizer notebook except the two items he put in a checklist for me.

@cetagostini
Copy link
Contributor

@juanitorduz tided up a few things. I think we are all clear on Carlos' suggestions - they mostly pertained to the optimizer notebook except the two items he put in a checklist for me.

Yes, we are good to go here from my side!

@daniel-saunders-phil daniel-saunders-phil enabled auto-merge (squash) August 18, 2025 19:44
@daniel-saunders-phil
Copy link
Contributor Author

@juanitorduz tided up a few things. I think we are all clear on Carlos' suggestions - they mostly pertained to the optimizer notebook except the two items he put in a checklist for me.

Yes, we are good to go here from my side!

When you have a second, could you approve? I think it's blocked until you say yes because you put in a request for changes last week.

@juanitorduz
Copy link
Collaborator

I approved! I think we need @carlosagostini 's blessing :)

@daniel-saunders-phil daniel-saunders-phil merged commit cfb4171 into main Aug 18, 2025
9 checks passed
@daniel-saunders-phil daniel-saunders-phil deleted the correct_multi_dimensional_nb branch August 18, 2025 19:56
@juanitorduz
Copy link
Collaborator

yay! Thanks @daniel-saunders-phil

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Improvements or additions to documentation MMM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants