Batch normalization is used to improve training stability and speed in neural networks by standardizing the inputs to each layer within a batch. It helps address internal covariate shiftβ€”where changing parameters during training causes distributions of each layer’s inputs to vary. This slows down training or leads to model instability. Batch normalization usually directly interacts with activation functions (like ReLU, sigmoid, and tanh) as it standardizes layer inputs before activation.

  1. Compute the Mean and Variance: For each training batch, calculate the mean and variance of each feature

  2. Normalize the Inputs: Standardize the inputs by subtracting the mean and dividing the variance to get a zero mean and unit variance

  1. Scale and Shift: Scale by and shift by to retain network flexibulity and allow it to learn the optimal distribution

Essentially, for an input feature , it’s normalized as: