Skip to content

Architecture

Convolutional Neural Network (CNN)

A neural network architecture that uses convolution layers to detect spatial patterns, dominant in image recognition tasks.

A Convolutional Neural Network (CNN) is a type of neural network designed to process grid-like data such as images. Instead of connecting every input pixel to every neuron, a CNN uses small filters (kernels) that slide across the input, detecting local patterns like edges, textures, and shapes. Stacking many such layers lets the network build up from simple features to complex objects. CNNs matter because they made modern computer vision practical. From 2012's AlexNet breakthrough on ImageNet through architectures like VGG, ResNet, and EfficientNet, CNNs powered face recognition, medical imaging, self-driving perception, and photo tagging for over a decade. Even today, when Vision Transformers grab headlines, CNNs remain the workhorse for many on-device and real-time vision tasks because they're efficient and translation-invariant by design. A useful analogy: imagine looking at a photo through a small magnifying glass, moving it across the picture and noting what you see in each spot. The first pass might pick out edges; a second pass over those notes spots corners; a third spots eyes and wheels; eventually you recognize "cat" or "car." Each "magnifying glass" is a learned convolutional filter, and pooling layers summarize regions so the network can focus on what matters regardless of exact position. Related concepts to explore next: pooling, ResNet, Vision Transformer (ViT), feature map, ImageNet, backpropagation.

Last updated: 2026-04-29

We use cookies

Anonymous analytics help us improve the site. You can opt out anytime. Learn more