[In Progress] Notes: The Principles of Deep Learning Theory
The Principles of Deep Learning Theory - Roberts, Yaida, Hanin
Chapter 0: Initialization
A network $f^\ast$ is constructed by sampling weights $\theta \sim p(\theta)$ and training $\theta \to \theta^*$ so that for input data $x$ the network output $f^\ast(x) := f(x; \theta^\ast)$ approximates the truth $f(x).$ The distribution $p(f^\ast)$ of trained networks is of interest.
Chapter 1: Pretraining
Review of the Gaussian distribution.
Chapter 2: Neural Networks
MLPs suffice as a minimal model for an effective theory of deep learning because other architectures like CNNs and transformers can be interpreted as MLPs with constraints on the relationship between weights. We will assume that initial weights are drawn independently from a zero-mean Gaussian distribution.
The distribution of interest can be specified as $p(f^\ast) = p \left( \left. z^{(L)} \, \right\vert \, \theta^\ast, \mathcal{D} \right),$ where $z^{(L)}$ is the preactivation of neurons in the $L$th layer (final layer) and $\mathcal{D}$ is the data set used during training.
Chapter 3: Effective Theory of Deep Linear Networks at Initialization
(in progress)
Leave a Comment