Context

The combination of efficient data collection techniques and the use of deep convolutional neural networks allowing end-to-end learning has recently resulted in superhuman performance in face verification and clustering [1,2,3].

This new approach is enabling new types of studies. Below I briefly describe how to use a pre-trained network to perform automatic labeling of a new dataset, what the principal component of a standard dataset looks like in feature space, and who are the actresses and actors who most look like each other on the IMDb website (according to a given network).

Note: these results were obtained in parallel to my actual work at iProov.

Automatic Labeling

1) Download a pre-trained network:

2) Download many pictures:

For instance by Googling “Bill Murray”.

3) Extract the faces:

For instance by using the simple Haar Cascades face detector from openCV.

4) Remove the junk images:

The openCV face detector performs relatively poorly. Fortunately, the “junk images” can easily be filtered out: their feature representations through the face network are “flatter” than the feature representations of actual faces (their norms d are smaller).

5) Remove anyone who's not Bill Murray:

The “Bill Murray” cluster can be isolated by using a standard algorithm such as DBSCAN.

First Principal Component

Face networks are typically trained by enforcing local constraints in feature space: two pictures of a same individual should have feature representations which are close from each other and two pictures of different individuals should have feature representations which are far from each other. Here, we propose to study the global distribution resulting from these local constraints by computing the first principal component of the entire Labeled Faces in the Wild dataset.

Interestingly, we obtain a direction that separates the gender of individuals fairly reliably (note that Labeled Faces in the Wild is strongly imbalanced in favor of men):

The network has spontaneously discovered the concept of gender: by learning to recognize identities, it has found that the population is constituted of two groups (women and men).

Face Similarities

We now compute the pairwise distances between the feature representations of the profile pictures of actresses and actors on the IMDb website (more precisely, the actresses and actors in the CASIA WebFace database).

We show below the pairs of profile pictures corresponding to the smallest distances (i.e. the individuals who look most like each other according to our pre-trained network):

We can make two observations:

The two pairs of individuals who look most like each other are twins.
Most of the pairs of individuals found are either women or belong to Black and Asian ethnic groups, even though these groups are under-represented in the CASIA WebFace database.
In other words, the pre-trained network is less sensitive to female and non-White facial features—a very serious algorithmic bias more and more recognized:
Facial Recognition Is Accurate, if You’re a White Guy (New York Times)