[In Progress] Notes: The Principles of Deep Learning Theory

The Principles of Deep Learning Theory - Roberts, Yaida, Hanin

Chapter 0: Initialization

A network $f^\ast$ is constructed by sampling weights $\theta \sim p(\theta)$ and training $\theta \to \theta^*$ so that for input data $x$ the network output $f^\ast(x) := f(x; \theta^\ast)$ approximates the truth $f(x).$ The distribution $p(f^\ast)$ of trained networks is of interest.

Chapter 1: Pretraining

Review of the Gaussian distribution.

Chapter 2: Neural Networks

MLPs suffice as a minimal model for an effective theory of deep learning because other architectures like CNNs and transformers can be interpreted as MLPs with constraints on the relationship between weights. We will assume that initial weights are drawn independently from a zero-mean Gaussian distribution.

The distribution of interest can be specified as $p(f^\ast) = p \left( \left. z^{(L)} \, \right\vert \, \theta^\ast, \mathcal{D} \right),$ where $z^{(L)}$ is the preactivation of neurons in the $L$th layer (final layer) and $\mathcal{D}$ is the data set used during training.

Chapter 3: Effective Theory of Deep Linear Networks at Initialization

(in progress)

Leave a Comment