YOLO V3 paper : arxiv.org/abs/1804.02767?e05802c1_page=1
YOLOv3 is a lot like YOLOv2. I'm just going to write down the difference.
1. Abstract
Updated from previous versions (YOLO v1, v2), YOLO v3 performs better than before.
Increased accuracy while still maintaining fast speed(FPS)
YOLOv3 with 320 x 320 input has a performance of 28.2 mAP, and its judgement speed is very fast at 22ms.
2. Introduction
There's nothing much to say, but I wrote that writers made some improvements in the existing YOLO.
3.1 Bounding Box Prediction
YOLO uses clustering to generate anchor boxes (bounding boxes with predefined forms) to predict the bounding boxes (for YOLO, x_center, y_center, box_width, box_height)
If the existing YOLO predicted the center point of the grid, how much it moves from the upper left offset after YOLOv2 (b_x, b_y), and predicts how much the width and height of the anchor box will be adjusted by exponential power (b_w, b_h)
Parameters related to Bounding Box Prediction
c_x, c_y : Coordinates at the top left of the grid cell (offset).
p_w, p_h : width and height of anchor box .
t_x, t_y, t_w, t_h : the coordinate value of the object that YOLO should predict (bounding box).
b_x, b_y, b_w, b_h : offset values of the final bounding box to calculate the actual GT (ground truth, label value) and IOU by adjusting the values mentioned above.
- b_x, b_y : initialize t_x, t_y to values between 0 and 1 through the sigmoid function.
Bounding Box Prediction's Loss Function
While training YOLO, it used sum of squared error loss.
YOLOv3 uses logistic regression to predict objectness scores for each bounding box.
Ignore other boxes (meaning that the IOU only counts for the highest box)
Use 0.5 as the threshold value of the IOU
3.2 Class Prediction
Each box uses the mulit-label classification to predict the class that the bounding box may contain.
class qualification : Logistic classifiers are used instead of softmax. -> Logistic classifiers: Sigmod, ReLU, Tanh, etc
Therefore, the loss function also uses binary cross entropy loss, not the loss function used in the unit-loss function used in the unit-class classification).
2.3. Predictions Across Scales
YOLOv3 predicts box on three different scales
YOLOv3 extracts features from this scale
The body creates 3 boxes for each scale
-> Tensor type: N x N x [3 x (4 + 1 + 80)]
N : Gird
3: Number of bounding boxes (#bb)
4 : 4 bounding box coordinates (offset: x, y, w, h)
1 : objectiveness
80 : The number of classes in the COCO dataset (class)
Create Anchor box
Using k-means clustering (a type of unsupervised learning algorithm) to determine anchor boxes.
Select a total of 9 clusters (anchor boxes) because they use 3 scales and create 3 boxes for each scale.
Anchor box type in COCO dataset: 1013, 1630, 3323, 3061, 6245, 59119, 11690, 156198, 373*326.
3.3 Feature Extractor
Darknet-53 is used as the backbone.
Using Shortcut Connections that were applied in ResNet.