AlexNet is a convolutional image classification network architecture that won the ImageNet visual recognition challenge in 2012. It was designed by Alex Krizhevsky, Ilya Sutskever and Geoffrey Hinton. The network achieved a top-5 error of 15.3%, more than 10.8 percentage points below the runner-up. It is considered one of the most influential papers in computer vision, leading to many more published papers that use CNNs to accelerate deep learning.
The network has eight layers: 5 convolutional layers (some followed by max-pooling layers) and also 3 fully connected layers. It can be depicted by:
where:
- CNN: convolutional layer (ReLU)
- RN: local response normalization
- MP: maxpooling
- FC: fully connected layer (ReLU)
- Linear: fully connected layer (no activation)
- DO: dropout layer