Batch Normalization— Kevin Jose Thomas

Batch normalization is used to improve training stability and speed in neural networks by standardizing the inputs to each layer within a batch. It helps address internal covariate shift—where changing parameters during training causes distributions of each layer’s inputs to vary. This slows down training or leads to model instability. Batch normalization usually directly interacts with activation functions (like ReLU, sigmoid, and tanh) as it standardizes layer inputs before activation.

Compute the Mean and Variance: For each training batch, calculate the mean and variance of each feature $μ_{batch} = \frac{1}{m} \sum_{i = 1}^{m} x_{i}$ $σ_{batch}^{2} = \frac{1}{m} \sum_{i = 1}^{m} (x_{i} - μ_{batch})^{2}$
Normalize the Inputs: Standardize the inputs by subtracting the mean and dividing the variance to get a zero mean and unit variance

\overset{x}{^} = \frac{x - μ _{batch}}{σ _{batch}^{2} + ϵ}

Scale and Shift: Scale by $γ$ and shift by $β$ to retain network flexibulity and allow it to learn the optimal distribution

y = γ \overset{x}{^} + β

Essentially, for an input feature $x$ , it’s normalized as:

y = γ \cdot \frac{x - μ _{batch}}{σ _{batch}^{2} + ϵ} + β

🪴 Knowledgebase

Explorer

Batch Normalization

Graph View

Backlinks