"Screw it " Project

Screw Classification Directory
174:/media/harddrive1/SGAO/cstor-projects/screw-it

note
Need to dive down to "Dropout", "Xavier" and "SOFTMAX" loss.

Data Augmentation (数据增强)

AlexNet (7 layers) has a total number of 60 million parameters. ImageNet has a total of 15 million high-resolution images. Therefore there is a good chance that the network overfit the dataset.

Data augmentation is one approach to address such issue.

rotating
cropping
add gaussian noise
fancy PCA

Fancy PCA

PCA color augmentation achieves data augmentation by altering the color balance of the image, i.e. adjusting the values of red, green and blue pixels in the image.

To be specific, the technique is designed to shift those values on which values are the most present in the image. Image with heavy red values and minimal green values will have their red values altered the most through fancy PCA.

Classification Model Training

Classify screws into two classes -- tight and loose.

Dataset: 38,650 for training, 4560 for testing (after data augmentation)

Model Training

Start from ZFNet (baseline)

                 accuracy = 87.37%
                 accuracy = 89.52%

Move to VGGNet (16 layers)

At first, the model won't learn. I did several things below to make it learn.

Change parameter initialization from Gaussian to Xavier
Increase mini batch size
Reduce the number of convolution filters
Reduce the number of neurons on the last two fully connected layers

    accuracy = 92.55% @ fully_connected_layer_1: num_output=256, 
                        fully_connected_layer_2: num_output=256
    accuracy = 94.08% @ fully_connected_layer_1: num_output=512
                        fully_connected_layer_2: num_output=512
    accuracy = 96.51% @ fully_connected_layer_1: num_output=512
                        fully_connected_layer_2: num_output=256

Faster-RCNN

A fantastic explanation here.

Breakthrough: RPN and Fast RCNN share convolutions at test-time, so that marginal cost for computing proposals is small.

Region Proposal Network

Objective: To generate detection proposals, serves as the "attention". RPN predicts possibility of an anchor being background or foreground, and refine the anchor.

What is it: a fully convolutional network (FCN) that can be trained end-to-end

                Input: feature map
                Output: objects bounds, objectness scores

How training data is made

We want to label the anchors using ground-truth boxes.

The idea is that we want to label the anchors having the higher overlaps with ground-truth boxes as foreground, the ones with lower overlaps as background.

ROI Pooling

After RPN, we proposed regions with different sizes, thus different sized regions means different sized CNN feature maps.

Region of Interest Pooling can simplify the problem by reducing the feature maps into the same size.

Here is a good explanation about ROI Pooling.

It is used for object detection tasks;
It allows us to reuse the feature map from the convolutional network;
It can significantly speed up both train and test time;
It allows to train object detection systems in an end-to-end manner.

Fast RCNN

Objective: uses proposed regions to classify objects into categories and background

Training Scheme

Alternating fashion: RPN -> Fast RCNN -> RPN -> Fast RCNN

Implementation Details

Non-Maximum Suppression (NMS)

Since RPN proposals highly overlap with each other, NMS is implemented on the proposal regions based on cls scores, i.e. objectness scores. By fixing IoU threshold at 0.7, NMS leaves use about 2,000 proposal regions per image.

RCNN (Regional Convolutional Neural Network)

One major question:
To what extent do CNN classification results on ImageNet generalize to object detection results on the PASCAL VOC Challenge?

The work was focused on two problems:

localizing objects with a deep network
train a high-capacity model with only a small quantity of annotated detection data
The second problem actually implemented transfer learning.

Localizing Objects

The paper solves the CNN localization problem by operating within the "recognition using regions" paradigm.

Object Detection with R-CNN

The system consists of three modules:

Region proposals: the proposals define the set of candidate detections available to our detector.
Fixed-length feature vector extraction: extract features from each region.
A set of class-specific linear SVMs.

Non-Maximum Suppression (NMS)

Given all scored regions in an image, for each class independently, we rejects a region if it has an intersection-over-union (IoU) overlap with a higher scoring selected region larger than a learned threshold.

screws