Abstract

The reasonable maximum demand on out-of-distribution generalization is that the learning algorithm should infer the simplest function that remains consistent with the observed data. Kolmogorov complexity intuitively formalizes the notion of simplicity from an information-theoretic viewpoint, as it associates the simplicity of a function with the description length of the shortest program that can compute it. In light of this descriptive complexity, this thesis addresses limitations in our model classes, optimization techniques, and statistical learnability conditions that impede learning simple functions.

Firstly, it demonstrates that models with a non-recursive structure such as finite-precision feed-forward neural networks are incapable of even expressing a class of functions that yet preserve a constantly low Kolmogorov complexity. On the other extreme, models that allowed to express any partial computable function would render learnability from finite datasets practically impossible with standard learning algorithms. This is because present learnability conditions in theory either restrict themselves to exemplary model or hypothesis classes [1],[2], or rely on overly conservative assumptions such as the ubiquitous i.i.d. assumption [3],[4].

In this setting of overarching model expressivity, this work instead substantiates how incorporating Kolmogorov complexity as a simplicity bias into the optimization objective function carves the way to formulate general distribution-free, both necessary and sufficient information-theoretic conditions that a dataset must satisfy to learn any partial computable function, and exemplifies how such an inductive bias can further reduce the sample size that is usually required for classical learnability guarantees. To that end, it proposes to directly draw upon Kolmogorov complexity to quantify the information that finite datasets convey about the functions that could have possibly generated it.

Because Kolmogorov complexity is incomputable in general, compression algorithms are typically proposed as viable approximations [5]. However, for any choice on the encoding of Turing Machines, contemporarily employed compression algorithms such as the Lempel-Ziv-Welch compression can not even yield approximate guarantees about the order between the Kolmogorov complexity of two binary strings, because such compression algorithms do not keep up with the compression power of Turing Machines.

References

[1] K. Ahuja, E. Caballero, D. Zhang, J.-C. Gagnon-Audet, Y. Bengio, I. Mitliagkas, and I. Rish. Invariance principle meets information bottleneck for out-of-distribution generalization. Advances in Neural Information Processing Systems, vol. 34, pp. 3438-3450, 2021.

[2] M. Arjovsky, L. Bottou, I. Gulrajani, and D. Lopez-Paz. Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.

[3] D. Paccagnan, M. Campi, and S. Garatti. The pick-to-learn algorithm: Empowering compression for tight generalization bounds and improved post-training performance. Advances in Neural Information Processing Systems, vol. 36, 2024.

[4] M. C. Campi and S. Garatti. Compression, generalization and learning. Journal of Machine Learning Research, vol. 24, no. 339, pp. 1-74, 2023.

[5] R. Cilibrasi and P. M. Vitányi. Clustering by compression. IEEE Transactions on Information theory, vol. 51, no. 4, pp. 1523-1545, 2005.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Studymaps		Studymaps
img		img
latex		latex
.gitignore		.gitignore
README.md		README.md
literature-review.md		literature-review.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Abstract

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Abstract

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages