Notes on information theory for artificial intelligence and
These are notes from talks I gave in 2014 about the information theory
and minimum description length viewpoint on statistical learning and
The notes have been kindly collected and edited by
- Kolmogorov complexity;
induction, prediction and compression, arithmetic coding, model
selection, AIC and BIC criteria...
See also my Notes (in French) about Aspects de l'entropie en
- Universal probability
distributions, two-part codes, and their optimal precision, universal
coding, Fisher information, confidence intervals, model selection...
- Jeffreys' prior. An application:
context tree weighting, Krichevsky--Trofimov estimator, Bayes...
- Information theory and possible issues with gradient
methods, homogeneity etc.
- Parametrization-invariant gradient
methods, Newton method, outer product metric, natural metric
- Fisher metric. A natural gradient
ascent: Expectation maximization, KL divergence, natural gradient,
- Regularization, and application to
model selection, the zero-frequency problem, overfitting,
cross-validation, BIC, slope heuristics...
- Variational Bayesian methods (Gaussians, dropout, model selection)
To leave a comment: contact (domain) yann-ollivier.org