Hyperparameters— Kevin Jose Thomas

Hyperparameters are essentially settings or configurations in machine learning models, and they often define the training paradigm. Properly tuning hyperparameters can significantly affect model performance, even with the same training data and architecture. Hyperparameters are typically set through experimentation, intuition, or automated search methods.

Model Architecture Hyperparameters

Model architecture hyperparameters help define the structure of the model itself, and this usually includes:

The number of layers in the network
The number of neurons in each layer
The type of Activation Function used (eg: ReLU, sigmoid, tanh)

Optimization Hyperparameters

Optimization hyperparameters help define the training process:

Learning Rate ( $η$ ): determines the step size during gradient descent. A high learning rate can lead to faster training but might miss the optimal solution, while a low learning rate will often lead to slower training but find a better solution
Batch Size: the number of training samples processed before updating the model weights, where smaller batches provide more frequent updates but may lead to noisier gradients
Momentum: helps accelerate gradient descent by smoothing updates
Weight Decay: a regularization technique that penalizes large weights to prevent overfitting

Regularization Hyperparameters

Regularization hyperparameters also help prevent overfitting by constraining the model:

Drouput Rate: which specifies the fraction of neurons to randomly deactivate during training
L1/L2 Regularization: adds penalties to the loss function based on the size of the weights (see regularization)

🪴 Knowledgebase

Explorer

Hyperparameters

Model Architecture Hyperparameters

Optimization Hyperparameters

Regularization Hyperparameters

Graph View

Backlinks