Understanding Convolutional Neural Networks with CNNs

In summary, the input grayscale image is passed through two convolutional layers with different kernels, resulting in a 3D matrix and 9 new feature maps. These feature maps are then flattened into a 1D vector and fed into the input layer of the artificial neural network.
  • #1
fog37
1,569
108
TL;DR Summary
Understand conv net layers in a CNN
Hello,
I have been learning about convolutional neural networks (CNNs) recently and wonder if I could get some help with a specific question:
  • Assume we start with an input grayscale image having size NxN pixels. The image is passed to the 1st convolutional layer which has 3 filters (kernels) of smaller size called K1, K2, K3.
  • Three convolutions are performed in this first conv layer: the 3 different kernels are sequentially applied to the input image to create the 3 different feature maps FP1, FP2, FP3 (the outputs of the convolution operations).
  • The 3 feature maps FP1, FP2, FP3 are then stacked in a 3D matrix called M1.
Assume there is also a 2nd convolutional layer with 3 more and different kernels K4, K5, K6.
How are the 3 kernels K4, K5, K6 in the 2nd conv layer applied to the 3 feature maps FP1, FP2, FP3 generated in the 1st conv layer?
Is K4 convolved with FP1, FP2, FP3, then K5 is convolved with FP1, FP2, FP3, and finally K6 is convolved with FP1, FP2, FP3? If so, we end up with a volume containing 9 new feature maps. Is that correct?


At very end, the 9 new features maps from the last convolutional layer are all flattened into a vector (1D array) with as many elements as the nodes in the input layer of the artificial neural network: starting with the first feature map, its rows are concatenated one by one in a straight line and this process continues for all other 8 feature maps. What we get a is a very long 1D vector that is then fed into the input layer of the ANN...

Thanks!
 
Last edited:
Technology news on Phys.org
  • #2
Yes, that is correct. The 3 kernels K4, K5, K6 in the 2nd conv layer are applied to the 3 feature maps FP1, FP2, FP3 generated in the 1st conv layer. K4 is convolved with FP1, FP2, FP3, then K5 is convolved with FP1, FP2, FP3, and finally K6 is convolved with FP1, FP2, FP3. This process will result in a volume containing 9 new feature maps.At the end, the 9 new features maps from the last convolutional layer are all flattened into a vector (1D array) with as many elements as the nodes in the input layer of the artificial neural network. Starting with the first feature map, its rows are concatenated one by one in a straight line and this process continues for all other 8 feature maps. This results in a very long 1D vector that is then fed into the input layer of the ANN.
 

FAQ: Understanding Convolutional Neural Networks with CNNs

What is a Convolutional Neural Network (CNN)?

A Convolutional Neural Network is a type of artificial neural network that is specifically designed to process and analyze visual data. It is inspired by the structure and functioning of the human visual system, and consists of multiple layers of interconnected neurons that are trained to recognize patterns in images.

How do CNNs work?

CNNs work by applying a series of mathematical operations to an input image. This includes convolving an input image with a set of learnable filters, applying non-linear activation functions, and pooling or downsampling the resulting feature maps. This process is repeated multiple times, with the final output being a probability distribution over the classes of the input image.

What are the advantages of using CNNs?

CNNs have several advantages over traditional machine learning techniques when it comes to image recognition tasks. They are able to automatically extract features from images, reducing the need for manual feature engineering. They also have a lower number of parameters compared to other models, making them more efficient in terms of memory and computation. Additionally, CNNs are able to handle variations in image size and scale, making them more robust to different input conditions.

What are some real-world applications of CNNs?

CNNs have been successfully applied in a variety of fields, including computer vision, natural language processing, and speech recognition. Some common applications include image classification, object detection, facial recognition, and medical image analysis. They are also used in self-driving cars, social media platforms, and recommendation systems.

How can I learn more about CNNs?

There are many resources available for learning more about CNNs, including online courses, tutorials, and books. Some popular online courses include "Introduction to Convolutional Neural Networks for Visual Recognition" by Stanford University and "Convolutional Neural Networks" by deeplearning.ai. Additionally, there are many research papers and articles published on CNNs that provide in-depth information on their architecture and applications.

Back
Top