Customs and Border Protection responded to a NIST report that found racial and gender bias in facial recognition software by defending the use of the technology and highlighting steps they take to increase its probability of being a reliable identification tool.
Results captured in the report, Face Recognition Vendor Test (FRVT) Part 3: Demographic Effects (NISTIR 8280), are intended to inform policymakers and to help software developers better understand the performance of their algorithms, according to the study released in December by the National Institute of Standards and Technology. Face recognition technology has inspired public debate in part because of the need to understand the effect of demographics on face recognition algorithms.
“While it is usually incorrect to make statements across algorithms, we found empirical evidence for the existence of demographic differentials in the majority of the face recognition algorithms we studied,” said Patrick Grother, a NIST computer scientist and the report’s primary author. “While we do not explore what might cause these differentials, this data will be valuable to policymakers, developers and end users in thinking about the limitations and appropriate use of these algorithms.”
Tests showed a wide range in accuracy across developers, with the most accurate algorithms producing many fewer errors. While the study’s focus was on individual algorithms, Grother pointed out five broader findings:
- For one-to-one matching, the team saw higher rates of false positives for Asian and African American faces relative to images of Caucasians. The differentials often ranged from a factor of 10 to 100 times, depending on the individual algorithm. False positives might present a security concern to the system owner, as they may allow access to impostors.
- Among U.S.-developed algorithms, there were similar high rates of false positives in one-to-one matching for Asians, African Americans and native groups (which include Native American, American Indian, Alaskan Indian and Pacific Islanders). The American Indian demographic had the highest rates of false positives.
- However, a notable exception was for some algorithms developed in Asian countries. There was no such dramatic difference in false positives in one-to-one matching between Asian and Caucasian faces for algorithms developed in Asia. While Grother reiterated that the NIST study does not explore the relationship between cause and effect, one possible connection, and area for research, is the relationship between an algorithm’s performance and the data used to train it. “These results are an encouraging sign that more diverse training data may produce more equitable outcomes, should it be possible for developers to use such data,” he said.
- For one-to-many matching, the team saw higher rates of false positives for African American females. Differentials in false positives in one-to-many matching are particularly important because the consequences could include false accusations. (In this case, the test did not use the entire set of photos, but only one FBI database containing 1.6 million domestic mugshots.)
- However, not all algorithms give this high rate of false positives across demographics in one-to-many matching, and those that are the most equitable also rank among the most accurate. This last point underscores one overall message of the report: Different algorithms perform differently.
CBP said in a statement that the agency has partnered with NIST “to gain valuable information about the performance of face comparison technologies” and that they believe “the study supports what CBP has seen in its biometric matching operations — that when a high-quality facial comparison algorithm is used along with a high performing camera, proper lighting, and image quality controls, face matching technology can be highly accurate.”
“CBP is able to achieve high match rates because it uses a quality algorithm, NEC, that is accurate, and provides continual feedback to stakeholders regarding image quality and lighting conditions. Furthermore, CBP only compares a traveler’s photo to a very small set of images which travelers have already provided to obtain a passport and/or visa. This practice of using small, flight specific galleries helps to ensure high match rates and more efficient traveler processing,” the statement continued.
“CBP’s operational data demonstrates that there is virtually no measurable differential performance in matching based on demographic factors. In instances when an individual cannot be matched by the facial comparison service, the individual simply presents their travel document for manual inspection by an airline representative of CBP officer, just as they would have done before.”
CBP said it “will continue to partner with NIST and use their research to ensure continued optimal performance of the CBP face comparison service,” and the agency “is committed to implementing the biometric Entry/Exit mandate in a way the provides a secure and streamlined travel experience for all travelers.”