Yolo V3

·

3 min read

YOLO V3 paper : arxiv.org/abs/1804.02767?e05802c1_page=1

YOLOv3 is a lot like YOLOv2. I'm just going to write down the difference.

1. Abstract

  • Updated from previous versions (YOLO v1, v2), YOLO v3 performs better than before.

  • Increased accuracy while still maintaining fast speed(FPS)

  • YOLOv3 with 320 x 320 input has a performance of 28.2 mAP, and its judgement speed is very fast at 22ms.

img1.daumcdn.png

2. Introduction

There's nothing much to say, but I wrote that writers made some improvements in the existing YOLO.

3.1 Bounding Box Prediction

  • YOLO uses clustering to generate anchor boxes (bounding boxes with predefined forms) to predict the bounding boxes (for YOLO, x_center, y_center, box_width, box_height)

  • If the existing YOLO predicted the center point of the grid, how much it moves from the upper left offset after YOLOv2 (b_x, b_y), and predicts how much the width and height of the anchor box will be adjusted by exponential power (b_w, b_h)

img1.daumcdn-1.png

  1. c_x, c_y : Coordinates at the top left of the grid cell (offset).

  2. p_w, p_h : width and height of anchor box .

  3. t_x, t_y, t_w, t_h : the coordinate value of the object that YOLO should predict (bounding box).

  4. b_x, b_y, b_w, b_h : offset values of the final bounding box to calculate the actual GT (ground truth, label value) and IOU by adjusting the values mentioned above.

  1. b_x, b_y : initialize t_x, t_y to values between 0 and 1 through the sigmoid function.

Bounding Box Prediction's Loss Function

  • While training YOLO, it used sum of squared error loss.

  • YOLOv3 uses logistic regression to predict objectness scores for each bounding box.

  • Ignore other boxes (meaning that the IOU only counts for the highest box)

  • Use 0.5 as the threshold value of the IOU

3.2 Class Prediction

  • Each box uses the mulit-label classification to predict the class that the bounding box may contain.

  • class qualification : Logistic classifiers are used instead of softmax. -> Logistic classifiers: Sigmod, ReLU, Tanh, etc

  • Therefore, the loss function also uses binary cross entropy loss, not the loss function used in the unit-loss function used in the unit-class classification).

2.3. Predictions Across Scales

  • YOLOv3 predicts box on three different scales

  • YOLOv3 extracts features from this scale

  • The body creates 3 boxes for each scale

-> Tensor type: N x N x [3 x (4 + 1 + 80)]

  • N : Gird

  • 3: Number of bounding boxes (#bb)

  • 4 : 4 bounding box coordinates (offset: x, y, w, h)

  • 1 : objectiveness

  • 80 : The number of classes in the COCO dataset (class)

Create Anchor box

  • Using k-means clustering (a type of unsupervised learning algorithm) to determine anchor boxes.

  • Select a total of 9 clusters (anchor boxes) because they use 3 scales and create 3 boxes for each scale.

  • Anchor box type in COCO dataset: 1013, 1630, 3323, 3061, 6245, 59119, 11690, 156198, 373*326.

3.3 Feature Extractor

  • Darknet-53 is used as the backbone.

  • Using Shortcut Connections that were applied in ResNet.

img1.daumcdn-2.png