Yolo V3

YOLOv3 is a lot like YOLOv2. I'm just going to write down the difference.

1. Abstract

Updated from previous versions (YOLO v1, v2), YOLO v3 performs better than before.
Increased accuracy while still maintaining fast speed(FPS)
YOLOv3 with 320 x 320 input has a performance of 28.2 mAP, and its judgement speed is very fast at 22ms.

img1.daumcdn.png

There's nothing much to say, but I wrote that writers made some improvements in the existing YOLO.

YOLO uses clustering to generate anchor boxes (bounding boxes with predefined forms) to predict the bounding boxes (for YOLO, x_center, y_center, box_width, box_height)
If the existing YOLO predicted the center point of the grid, how much it moves from the upper left offset after YOLOv2 (b_x, b_y), and predicts how much the width and height of the anchor box will be adjusted by exponential power (b_w, b_h)

img1.daumcdn-1.png

c_x, c_y : Coordinates at the top left of the grid cell (offset).
p_w, p_h : width and height of anchor box .
t_x, t_y, t_w, t_h : the coordinate value of the object that YOLO should predict (bounding box).
b_x, b_y, b_w, b_h : offset values of the final bounding box to calculate the actual GT (ground truth, label value) and IOU by adjusting the values mentioned above.

b_x, b_y : initialize t_x, t_y to values between 0 and 1 through the sigmoid function.

While training YOLO, it used sum of squared error loss.
YOLOv3 uses logistic regression to predict objectness scores for each bounding box.
Ignore other boxes (meaning that the IOU only counts for the highest box)
Use 0.5 as the threshold value of the IOU

Each box uses the mulit-label classification to predict the class that the bounding box may contain.
class qualification : Logistic classifiers are used instead of softmax. -> Logistic classifiers: Sigmod, ReLU, Tanh, etc
Therefore, the loss function also uses binary cross entropy loss, not the loss function used in the unit-loss function used in the unit-class classification).

2.3. Predictions Across Scales

-> Tensor type: N x N x [3 x (4 + 1 + 80)]

Using k-means clustering (a type of unsupervised learning algorithm) to determine anchor boxes.
Select a total of 9 clusters (anchor boxes) because they use 3 scales and create 3 boxes for each scale.
Anchor box type in COCO dataset: 1013, 1630, 3323, 3061, 6245, 59119, 11690, 156198, 373*326.

img1.daumcdn-2.png