Long Short-Term Memory (LSTM)— Kevin Jose Thomas

Long Short-Term Memory models are a type of Recurrent Neural Networks (RNN) that are specifically designed to learn and remember patterns over time. They address the challenges faced by traditional RNNs, particularly the problem of vanishing and exploding gradients, which made it difficult to learn long-term dependencies in sequential data. LSTMs are mostly used for sequential data tasks, especially Natural Language Processing, speech recognition, time series prediction, etc

How LSTMs Work

LSTMs use a special structure known as a memory cell to store information over time. Each cell has gates that regulate the flow of information, and they decide what to keep, update, or discard. These gates are implemented using neural network layers with sigmoid or tanh activations.

Forget Gate ( $f_{t}$ )

Decides what information to discard from the cell state. It is represented by the formula: $f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})$ where:

$W_{f}$ and $b_{f}$ are weights and biases
$σ$ is the Sigmoid Activation Function
$h_{t - 1}$ is the previous hidden state
$x_{t}$ is the current input

Input Gate ( $i_{t}$ )

Decides what information to add to the cell state. It is represented by the formula:$$$i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) $an d t h e f or m u l a f or t h ec an d i d a t e v a l u es i s :$ \tilde{C}t = \tanh(W_c \cdot [h{t-1}, x_t] + b_c)$$

Update Cell State

Combines the previous cell state, the forget gate’s decision, and the input gate’s updates. It is represented by the formula: $C_{t} = f_{t} \cdot C_{t - 1} + i_{t} \cdot \tilde{C}_{t}$

Output Gate ( $o_{t}$ )

Decides what part of the cell state to output. It is represented by the formula: $o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o})$ and then the hidden state is calculated as: $h_{t} = o_{t} \cdot tanh (C_{t})$

🪴 Knowledgebase

Explorer

Long Short-Term Memory (LSTM)

How LSTMs Work

Forget Gate ( $f_{t}$ )

Input Gate ( $i_{t}$ )

Update Cell State

Output Gate ( $o_{t}$ )

Graph View

Backlinks

🪴 Knowledgebase

Explorer

Long Short-Term Memory (LSTM)

How LSTMs Work

Forget Gate (ft​)

Input Gate (it​)

Update Cell State

Output Gate (ot​)

Graph View

Backlinks

Forget Gate ( $f_{t}$ )

Input Gate ( $i_{t}$ )

Output Gate ( $o_{t}$ )