Back to: Main Page > Yann Ollivier's professional page

Notes on information theory for artificial intelligence and statistical learning

These are notes from talks I gave in 2014 about the information theory and minimum description length viewpoint on statistical learning and artificial intelligence.

The notes have been kindly collected and edited by Jérémy Bensadon.

Kolmogorov complexity; induction, prediction and compression, arithmetic coding, model selection, AIC and BIC criteria...
See also my Notes (in French) about Aspects de l'entropie en mathématiques.
Universal probability distributions, two-part codes, and their optimal precision, universal coding, Fisher information, confidence intervals, model selection...
Jeffreys' prior. An application: context tree weighting, Krichevsky--Trofimov estimator, Bayes...
Information theory and possible issues with gradient methods, homogeneity etc.
Parametrization-invariant gradient methods, Newton method, outer product metric, natural metric
Fisher metric. A natural gradient ascent: Expectation maximization, KL divergence, natural gradient, Cramer--Rao bound...
Regularization, and application to model selection, the zero-frequency problem, overfitting, cross-validation, BIC, slope heuristics...
Variational Bayesian methods (Gaussians, dropout, model selection)

Back to: Main Page > Yann Ollivier's professional page

To leave a comment: contact (domain) yann-ollivier.org