Dropout and model averaging

A few results that are important to know about dropout.

Difference between dropout and bagging:

  • Dropout is an approximation of model averaging for deep non-linear networks,
  • Bagging usually uses the arithmetic mean versus the geometric mean for dropout,
  • Only a single datapoint is used to train each model during dropout (except if the model is very small or trained sufficiently long that the same dropout mask can be picked several times),
  • In dropout, all the models share the same parameters.

Theoretical results:

  • Linear network: dropout is equivalent to mean-averaging,
  • Network with one hidden layer of logistic neurons and a linear output: dropout is equivalent to geometric-averaging,
  • Deep non-linear network: dropout is an approximation of geometric averaging.

Proofs and detailed results can be found in this paper.

Experimental results:

  • Dropout performs as well as geometric averaging,
  • Arithmetic average performs as well as geometric average,
  • Dropout performs better than averaging untied networks,
  • Averaging untied networks performs better than baseline sgd.

More experiments and results can be found in this paper.

Written on September 11, 2014