# Notes

## You only look once (YOLO) -- (2)

YOLO has higher localization errors and the recall (measure how good to locate all objects) is lower, compared to SSD. YOLOv2 is the second version of the YOLO with the objective of improving the accuracy significantly while making it faster. The backbone network architecture of YOLO v2 is as follows: 1. Accuracy Improvements Batch Normalization Also removes the need of dropouts. mAP increases by 2%. High-resolution Classifier To generate predictions with shape of $7\times 7 \times 125$, we replace the final fully connected layers with a $3\times 3$ convolution layer each outputting 1024 output channels.

## You only look once (YOLO) -- (1)

You Only Look Once (YOLO) is an object detection system targeted for real-time processing. There are three versions of YOLO: YOLO, YOLOv2 (and YOLO9000) and YOLOv3. For this article, we mainly focus on YOLO first stage. 1. Introduction The target is to find out the bounding box (rectangular boundary frame) of all the objects in the picture and meanwhile judge the categories of them, where left top coordinate denoted by $(x,y)$, as well as the width and height of the rectangle bounding box by $(w,h)$.

## Pooling Layer in CNN (1)

Today I didn’t have the mood to continue my work on map merging of different cameras. So I read the paper from DeepMind of Learned Deformation Stability in Convolutional Neural Networks recommended by Wang Chen. 1. Convolution Operation Convolution operation is typically denoted with an asterisk1: $$s(t)=(x*w)(t)$$ In Convolutional network terminology, the x is referred to as the input, and the w as the kernel. The output is sometimes referred to as the feature map.