Face Identification / Verification / Embeddings ...
Probabilistic Face Embeddings
by Yichu Shi, Anil K. Jain el ta.
Idea
Represent face features as distributional estimation instead of point estimation, i.e. Probabilistic Face Embeddings.
Related Topics
Uncertainty Learning
Probabilistic Face Representation
Quality-Aware Pooling
Variance Protocols of Benchmarks, for example,
Protocols of IJBS-S:
- surveillance-to-single;
- surveillance-to-booking;
- surveillance-to-surveillance;
- surveillance-to-still;
Papers to Read
Face Recognition with Image Sets using Manifold Density Divergence
low-dimensional manifolds, statistical formulation
Face Recognition from Long-Term Observations
Use a set of images to represent an identity
Log-Euclidean Metric Learning on Symmetric Positive Definite Manifold with Application to Image Set Classification
Video Face Recognition: Component-wise Feature Aggregation Network (C-FAN)
Neural Aggregation Network for Video Face Recognition
Thoughts and TODOs
TODO: Calculate cosine similarity between a high-quality face image and its degraded versions (Gaussian blur, motion blur and other currently used data augmentations), and similarities of degraded images of different identities.
- false accept of imposter low-quality pairs;
- false reject of genuine cross-quality pairs;
Thought: It is necessary for network to see same samples with its degraded versions (after data augmentation) to address the second issue.
AdaCos: Adaptively Scaling Cosine Logits for Effectively Learning Deep Face Representations
by Xiao Zhang, et al.
Idea
Margin-s (scale) and margin-m are the two most influential parameters in margin-based softmax method. Setting these parameters wisely is vital to effective training. The idea is to span softmax probability wide within the range of (0, 1) while generating substantial gradients when cosine similarities are not close enough.
Effects of Margin-s and Margin-m
- Margin-s
Scale needs to be sufficiently large so that softmax probability can reach the value of 1. Too large margin-s, however, will degenerate network's performance since it will not generate noticeable gradients even if cosine similarity is not close enough.
- Margin-m
The value of m
determines where (how small the angle is) softmax probability stops being zero.
New Reflections and Takeaways
Vanilla softmax and cross-entropy loss does not optimize cosine similarity explicitly. Margin-based softmax however, optimizes cosine distance directly.
Idealy,
P_ij
the probability should gradually increase from 0 to 1 whenTheta_ij
decreases frompi/2
to0
. Would a more linear relationship help? And what is so special aboutsoftmax
and others --tanh
,sigmoid
etc.Margin-based softmax (arcface) has the explicit optimization goal to reduce intra-class variance, while only implicitly increase inter-class variance.
Learning Discriminative Features via Weights-biased Softmax loss
The loss is designed to increase inter-class variance.
DocFace+: ID Document to Selfie Matching
Dynamic Weight Imprinting
Problem: due to SGD optimizer, in the case of two-shot classification, each weight vector receives attraction signals only twice per epoch. These sparse attraction signals make little difference to the classifier weights which causes underfitting of classifier weights.
Dynamic Weight Imprinting is to replace weight vector with feature in current mini-batch?
Data Sampling
Sample ID-selfie pair during training.
Parameter Sharing for Domain-Specific Representation
How does parameter sharing work?
Other Topics of Interest
- Heterogeneous Face Recognition
- Low-Shot Learning
- meta-learning, producing new classifiers
- Study of Weight Imprinting