Accuracy of facial recognition technology: assessing the risks of wrong identification

AuthorEuropean Union Agency for Fundamental Rights (EU body or agency)
Pages9-10
FRA Focus
9
4.
Accuracy of facial recognition technology:
assessing the risks of wrong identif‌ication
4.1.
Technological
developments and
performance assessment
The high level of attention given to facial recognition
technology in the recent past stems from strong accu-
racy gains achieved since 2014.
36
The accuracy gains
are mainly attributed to the availability of increased
computational power, massive amounts of data (dig-
ital images of people and their faces), and the use of
modern machine learning algorithms.
37
Determining the necessary level of accuracy of facial
recognition software is challenging: there are many
different ways to evaluate and assess accuracy, also
depending on the task, purpose and context of its use.
When applying the technology in places visited by mil-
lions of people – such as train stations or airports – a
relatively small proportion of errors (e.g. 0.01%) still
means that hundreds of people are wrongly f‌lagged.
In addition, certain categories of people may be more
likely to be wrongly matched than others, as described
in Section 3. There are different ways to calculate and
interpret error rates, so caution is required.
38
In addi-
tion, when it comes to accuracy and errors, questions
in relation to how easily a system can be tricked by,
for example, fake face images (called ‘spoof‌ing’) are
important particularly for law enforcement purposes.39
Facial recognition technologies, like other machine-
learning algorithms, have binary outcomes, meaning
that there are two possible outcomes. It is there-
fore useful to distinguish between false positives
and false negatives:
36 See Grother, P., Ngan, M., and Hanaoka, K. (2018), Ongoing
Face Recognition Vendor Test (FRVT) Part 2: Identication,
NISTIR 8238; or Galbally, J., Ferrara, P., Haraksim, R., Psyllos, Al,
and Beslay, L. (2019), Study on Face Identication Technology
for its Implementation in the Schengen Information System,
Luxembourg, Publications Oce, July 2019.
37
For facial image recognition, the success mostly stems
from the use of deep convolutional neural networks. These
algorithms learn generic patterns of images by splitting
images in several areas.
38 For more detailed discussions of evaluation metrics, see
Grother, P., Ngan, M., and Hanaoka, K. (2018), Ongoing Face
Recognition Vendor Test (FRVT) Part 2: Identication, NISTIR
8238; or Galbally, J., Ferrara, P., Haraksim, R., Psyllos, Al, and
Beslay, L. (2019), Study on Face Identication Technology
for its Implementation in the Schengen Information System,
Luxembourg, Publications Oce, July 2019.
39 See for example: Parkin, A. and Grinchuk O. (2019), Recognizing
Multi-Modal Face Spoong with Face Recognition Networks.
A ‘false positive’ refers to the situation where
an image is falsely matched to another image on
the watchlist. In the law enforcement context,
this would mean that a person is wrongly identi-
f‌ied as being on the watchlist by the system. This
has crucial consequences on that persons’ fun-
damental rights. The “false positive identif‌ication
rate” gives the proportion of erroneously found
matches (e.g. number of people on the watchlist
identif‌ied who are in fact not on the watchlist)
among all those who are not on the watchlist.
False negatives are those who are deemed not to
be matches (i.e. not on the watchlist), but in fact
are matches. The corresponding “false negative
identif‌ication rate”, or “miss rate”, indicates the
proportion of those erroneously not identif‌ied
among those who should be identif‌ied.
The issue of false positives and false negatives is
also connected to data quality and to the accuracy
of data processing. Addressing this requires a regular
correction and updating of the facial images stored
in a watchlist in order to ensure accurate processing.
When discussing error rates, three important con-
siderations need to be kept in mind:
First, an algorithm never returns a def‌initive
result, but only probabilities. For example: with
80% likelihood, the person shown on one image
is the person on another image on the watchlist.
This means that thresholds or rank-lists need to
be def‌ined for making decisions about matches.
Second, as a consequence, there is always a trade-
off between false positives and false negatives
because of the decision on a probability thresh-
old. If the threshold is higher, false positives will
decrease, but false negatives will increase, and the
other way round. This is why such rates are usually
reported with the other rate at a f‌ixed level (e.g.
the miss rate is reported at the f‌ixed false positive
identif‌ication rate of 0.01, i.e. 1 %).
40
Third, the rates need to be evaluated with the quan
-
tities of real cases in mind. If a large number of
people are checked in mass, a potentially small
false positive identif‌ication rate still means that a
40 E.g. Grother, P., Ngan, M., and Hanaoka, K. (2018), Ongoing
Face Recognition Vendor Test (FRVT) Part 2: Identication,
NISTIR 8238.

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT