SENet Paper Review

The paper I'm going to read this time is SENet, Squeeze-and-Excitation Networks.

SENet is a model that increases performance by focusing on interaction between channels. Interactions between channels can be considered as weights. A weighted channel can be interpreted as having important features. Weights each channel in a feature map and multiplies each channel in the feature map. In other words, SENet can be considered as a model that increases performance by calculating weights between channels. Now let's see how to calculate this weight.

SB Block (Squeeze + Excitation) calculates per-channel weights and shows how the feature map is multiplied. It can be seen that the weight expressed in color is multiplied by the feature map, so that the color of the feature map has also changed.

SE Block

SENet is a model that utilizes SE Block. SE Block is attached to CNN-based models for use. Can be used in conjunction with the residual model or the Inception model. It can also be attached to VGGnet for use. As such, SE Block is flexible. The funny thing is that at the low-level, SE Block extracts important features regardless of class, and at the high-level, it extracts features related to classes.

Let's take a look at SE Block. SE Block consists of two processes: Squeeze and Excitation.

Screenshot 2022-11-07 at 2.41.29 AM.png

(1) Squeeze

To calculate the weight for each channel, first, each channel must be made one-dimensional. For example, if there are 3 channels, the weight must be expressed as [0.6, 0.1, 0.7]. Squeeze is responsible for making each channel one-dimensional (compression).

Screenshot 2022-11-07 at 2.43.10 AM.png

Squeeze receives a feature map generated through conv operation as input. HxWxC size feature map is compressed to (1x1xC) through global average pooling operation. After adding up all the pixel values for one channel of the feature map, divide by HXW and compress to 1x1x1. The feature map has C channels, so if you connect them all, you get (1x1xC).

(2) Excitation

Excitation normalizes the (1x1xC) vector generated by Squeeze and assigns weights. Excitation consists of FC1 - ReLU - FC2 - Sigmoid. A (1x1xC) vector is input to FC1, reducing the C channel to C/r channels. r is a hyper parameter. It is said that the bottleneck structure was chosen because of the limitation of the amount of computation and the generalization effect. The vector reduced to C/r channels (1x1xC/r) is passed to ReLU and passed through FC2. FC2 puts the number of channels back to C. And it has a value in the range of [0~1) through the sigmoid. Finally, it is multiplied by the feature map to weight the channels in the feature map.

SENet

Screenshot 2022-11-07 at 2.46.46 AM.png