Skip to content

Astrostatistics and Machine Learning class for the MSc degree in Astrophysics at the University of Milan-Bicocca (Italy)

License

Notifications You must be signed in to change notification settings

dgerosa/astrostatistics_bicocca_2026

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,186 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Astrostatistics and Machine Learning

Davide Gerosa - davide.gerosa@unimib.it
University of Milano-Bicocca, 2026.

Aims

The use of statistics is ubiquitous in astronomy and astrophysics. Modern advances are made possible by the application of increasingly sophisticated tools, often dubbed "data mining", "machine learning", and "artificial intelligence". This class provides an introduction to (some of) these statistical techniques in a very practical fashion, pairing formal derivations with hands-on computational applications. Although examples will be taken almost exclusively from the realm of astronomy, this class is appropriate for all Physics students interested in machine learning.

Lectures

  1. Introduction I. Data mining and machine learning. My research interests. Python setup. Version control with git. *
  2. Probability and Statistics I. Probability. Bayes' theorem. Random variables. *
  3. Probability and Statistics II. Monte Carlo integration. Descriptive statistics. Common distributions. *
  4. Probability and Statistics III. Central limit theorem. Multivariate pdfs. Correlation coefficients. Sampling from arbitrary pdfs. *
  5. Frequentist Statistical Inference: I. Frequentist vs Bayesian inference. Maximum likelihood estimation. Omoscedastic Gaussian data, Heteroscedastic Gaussian data, non Gaussian data. *
  6. Frequentist Statistical Inference: II. Maximum likelihood fit. Role of outliers. Goodness of fit. Model comparison. Gaussian mixtures. Boostrap and jackknife. *
  7. Frequentist Statistical Inference: III. Hypothesis testing. Comparing distributions, KS test. Histograms. Kernel density estimators. *
  8. Bayesian Statistical Inference: I. The Bayesian approach to statistics. Prior distributions. Credible regions. Parameter estimation examples (coin flip). Marginalization.
  9. Bayesian Statistical Inference: II. Parameter estimation examples (Gaussian data, background). Model comparison: odds ratio. Approximate model comparison.
  10. Bayesian Statistical Inference: III. Monte Carlo methods. Markov chains. Burn-in. Metropolis-Hastings algorithm. *
  11. Bayesian Statistical Inference: IV. MCMC diagnostics. Traceplots. Autocorrelation length. Samplers in practice: emcee and PyMC3. Gibbs sampling. Conjugate priors. *
  12. Bayesian Statistical Inference: V. Evidence evaluation. Model selection. Savage-Dickey density ratio. Nested sampling. Samplers in practice: dynesty. *
  13. Introduction II. Data mining and machine learning. Supervised and unsupervised learning. Overview of scikit-learn. Examples. *
  14. Clustering. K-fold cross-validation. Unsupervised clustering. K-Means Clustering. Mean-shift Clustering. Correlation functions. *
  15. Dimensional Reduction I. Curse of dimensionality. Principal component analysis. Missing data. Non-negative matrix factorization. Independent component analysis. *
  16. Dimensional Reduction II - Density estimation. Non-linear dimensional reduction. Locally linear embedding. Isometric mapping. t-SNE. Recap of density estimation. KDE. Nearest-Neighbor. Gaussian Mixtures. Pills of modern research
  17. Regression I. What is regression? Linear regression. Polynomial regression. Basis function regression. Kernel regression. Over/underfitting. Cross-validation. Learning curves. *
  18. Regression II. Regularization. Ridge. LASSO. Non-linear regression. Gaussian process regression. Total least squares. *
  19. Classification I. Generative vs discriminative classification. Receiver Operating Characteristic (ROC) curve. Naive Bayes. Gaussian naive Bayes. Linear and quadratic discriminant analysis. GMM Bayes classification. K-nearest neighbor classifier. *
  20. Classification II. Logistic regression. Support vector machines. Decision trees. Bagging. Random forests. Boosting. *
  21. Deep learning I. Loss functions. Gradient descent, learning rate. Adaptive boosting. Neural networks. Backpropagation. Layers, neurons, activation functions, regularization schemes. *
  22. Deep learning II. TensorFlow, keras, and pytorch. Convolutional neural networks. Autoencoders. Generative adversarial networks. *
Additional lectures not covered in class
  1. Time series analysis I. Detect a variability. Fourier analysis. Temporally localized signals. Periodic signals. Lomb-Scargle periodogram. Multiband strategies. *
  2. Time series analysis II. Stochastic processes. Autoregressive models. Moving averages. Power-spectral density. Autocorrelation. White/red/pink noise. Unevenly sampled data.

* = Time to get your hands dirty!

❗ Important

Data mining and machine learning are computational subjects. One does not understand how to treat scientific data by reading equations on the blackboard: you will need to get your hands dirty (and this is the fun part!). Students are required to come to classes with a laptop or any device where you can code on (larger than a smartphone I would say...). Each class will pair theoretical explanations to hands-on exercises and demonstrations. These are the key content of the course, so please engage with them as much a possible.

At various points during the lectures you fill find some "Time to get your hands dirty" statements. That means you got to start coding!

Textbook and Resources

The main textbook we will be using is:

"Statistics, Data Mining, and Machine Learning in Astronomy", Željko, Andrew, Jacob, and Gray. Princeton University Press, 2012.

It's a wonderful book that I keep on referring to in my research. The library has a few copies; you can also download a digital version from the Bicocca library website. What I really like about that book is that they provide the code behind each single figure: astroml.org/book_figures. The best way to approach these topics is to study the introduction on the book, then grab the code and try to play with it.  Make sure you get the updated edition of the book (that's the one with a black cover, not orange) because all the examples have been updated to python 3. 

There are many other good resources in astrostatistics, here is a partial list. Some of them are free.

We will make heavy usage of the python programming language. If you need to refresh your python skills, here are some catch-up resources and online tutorials. A strong python programming background is essential in modern astrophysics! 

2026 Class schedule

The class covers 6 credits = 42 hours = 21 lectures of 2 hours each. Our schedule is as follows. Here is a public calendar with the dates below, which you can import into your calendar software.

  1. 2026, Mar 02, 08:30am - 10:30am. Room U7-15.
  2. 2026, Mar 09, 08:30am - 10:30am. Room U7-15.
  3. 2026, Mar 12, 10:30am - 12:30pm. Room U2-05.
  4. 2026, Mar 16, 08:30am - 10:30am. Room U7-15.
  5. 2026, Mar 19, 10:30am - 12:30pm. Room U2-05.
  6. 2026, Mar 23, 08:30am - 10:30am. Room U7-15.
  7. 2026, Mar 26, 10:30am - 12:30pm. Room U2-05.
  8. 2026, Mar 30, 08:30am - 10:30am. Room U7-15.
  9. 2026, Apr 09, 10:30am - 12:30pm. Room U2-05.
  10. 2026, Apr 13, 08:30am - 10:30am. Room U7-15.
  11. 2026, Apr 16, 10:30am - 12:30pm. Room U2-05.
  12. 2026, Apr 20, 08:30am - 10:30am. Room U7-15.
  13. 2026, Apr 23, 10:30am - 12:30pm. Room U2-05.
  14. 2026, Apr 27, 08:30am - 10:30am. Room U7-15.
  15. 2026, Apr 30, 10:30am - 12:30pm. Room U2-05.
  16. 2026, May 04, 08:30am - 10:30am. Room U7-15.
  17. 2026, May 07, 10:30am - 12:30pm. Room U2-05.
  18. 2026, May 11, 08:30am - 10:30am. Room U7-15.
  19. 2026, May 14, 10:30am - 12:30pm. Room U2-05.
  20. 2026, May 18, 08:30am - 10:30am. Room U7-15.
  21. 2026, May 21, 10:30am - 12:30pm. Room U2-05.
  22. 2026, May 25, 08:30am - 10:30am. Room U7-15 (backup slot in case we skip one).
  23. 2026, May 28, 10:30am - 12:30pm. Room U2-05 (backup slot in case we skip one).

Exams

Exam guidelines are available here. Please read that carefully.

Past editions

A huge thanks to:

This class draws heavily from many others that came before me. Credit goes to:

Careful...

Credit: xkcd 2582

About

Astrostatistics and Machine Learning class for the MSc degree in Astrophysics at the University of Milan-Bicocca (Italy)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 11