Hirotaka Nakasone, Senior Scientist, FBI Voice Recognition Program, examines the use and effectiveness of current speaker authentication technologies atthe FBI. In this IDGA exclusive, Nakasone also highlights the various challenges that are unique to voice recognition, and discusses what plans are in place for capturing voice recordings in line with the FBI’s Next Generation Identification (NGI project).
IDGA: Examine the use and effectiveness of current speaker authentication technologies at the FBI
The FBI’s use of the speaker recognition technology dates back to the early 1960’s. A team of FBI special agents and technical support personnel began to develop a protocol to perform voice comparison examinations by using the sound spectrograph. But this spectrographic technique had always been used only as investigative guidance only — never had been introduced in the court of law due to the inconclusive nature of the technology.
Concerned about the controversial nature of the technique and inconsistent admissibility status of the technique in criminal proceedings, in 1976 FBI commissioned the National Research Council’s (NRC), National Academy of Sciences (NAS) to review and assess the status of the spectrographic speaker recognition. The results of the NAS’s study were published in 1979 paper, On the Theory and Practice of Voice Identification. Subsequent to this NAS report, FBI determined to continue its original policy on the spectrographic voice identification, that is, to use it only as investigative guidance. This practice prevailed for the next three decades.
In the late 1990’s, the FBI began the development of the automated speaker recognition technology by the leading research groups sponsored by the FBI and other US government agencies. This effort was accelerated by the sponsorship of the Biometric Center of Excellence (BCOE) in 2007. Currently, the FBI offers forensic speaker recognition analysis services by using the automated speaker recognition technology for its field offices within the US and abroad. Here are some high lights about the FBI’s current speaker recognition technology:
- Conducted within an FBI’s forensic unit within Digital Evidence Laboratory that is accredited by the American Society of Crime Laboratory Directors – Laboratory Accreditation Board (ASCLD/LAB);
- Conducted by fully trained examiners with technical and engineering background;
- Conducted by using multiple sets of advanced state-of-the-art speaker recognition algorithms;
- Conducted under standard operating procedures;
- Conducted only for investigative and intelligence purposes – not for courtroom purposes; and
- The primary speaker recognition system is capable of conducting channel-independent and language-independent recognition under a certain set of forensic conditions with known reasonably acceptable levels of accuracy.
What biometric challenges are unique to voice recognition?
I want to address three challenges unique to voice recognition. Please note that these challenges were also recognized by the 2011 NSTC Biometric Challenge–Update as well.
Challenge #1 is dynamic nature of human speech production that changes constantly as a function of time, therefore it requires a long sampling period ranging from tens of seconds to a few minutes to create a statistical meaningful speaker model. Voice is more time intensive in contrast to other biometric modalities like fingerprint, iris, face, DNA, SMT, etc.
Challenge #2 is the fragility of human speech that is susceptible to different recording environment and equipment used for capture.
Challenge #3 is the susceptibility of human speech that is affected by different state of emotions or different speaking styles. Those researchers in the speaker recognition community are well aware of these challenges, and have resolved some of them.
In contrast to these challenges, voice recognition has its undeniable strength as well. Automatic speaker recognition is a highly scientific forensic process, usingsolid mathematical and statistical foundations. In that sense I do not foresee any serious issue in fusing voice with other modalities to create a multi-modal biometrics database. The FBI’s BCOE at the Criminal Justice and Information Service Division has sponsored such projects to collect multimodal biometric data including face, iris, face, fingerprint and voice.
What plans are in place for capturing voice recordings in line with the FBI’s Next Generation Identification (NGI project)? What are the short and long term aims?
The US government Inter-agency collaboration began to consider an implementation of a voice data collection for voice biometric application about three years ago in March of 2009 by establishing the Symposium for Investigatory Voice Biometrics (SIVB). The SIVB activity over the past three years was culminated in drafting of the ANSI/NIST ITL Type-11 Voice Record that is meant to enable the interoperability of voice records among laboratories, field offices, and government agencies for investigative and intelligence purposes.
Type-11 still needs to go through the open vetting process by the ANSI/NIST Standards office before it becomes an A/N standard. It is anticipated that Type-11 is ratified in six months to a year time frame. The short-term aim and the long-term aim next are based on the presumption of successful completion of Type-11.
As short-term aims we will (1) complete Type-11 Voice Record, (2) develop best practices for voice data transactions using Type-11, other ANSI/NIST Types, and Electronic Biometric Transmission Specification (EBTS), (3) build community consensus, and (4) implement an interoperability concept of operation within a scaled-down pilot study, and (5) to establish a new Scientific Working Group for Voice to guide engineers and scientists for creation of the consensus standards for voice biometrics for the intelligence and law enforcement communities. I envision this will take three to five years before we can see the fruits.
As long-term aims we will plan to (1) design and study for voice collection from the individuals at booking stations, during criminal interviews, prisons, etc, (2) maintain those voice databases in a centralized reservoir (FBI’s NGI) for future search purpose.
Speak to how you foresee voice biometrics being used within the military and commercially in the next decade? Do you think voice will be more or less popular than other biometric indicators?
The military application is seen in a variety of voice biometrics initiatives and projects in many US DoD agencies such as Army, Air Force, Navy, etc. for identity management, force protection, or counterterrorism purposes. I foresee the similar growth rate of voice biometric exploitation in military as in non-DoD government agencies.
In the commercial domain, I foresee more speech recognition technology applications than voice biometrics technology. I tend to think that voice biometrics will find its best use within the government and military as investigative or intelligence tool, and less popular in commercial world where privacy issues of non-criminal, innocent citizens are more often involved. Voice is currently not as popular as other modalities (like fingerprint, DNA, iris and face),but due to increasing interest and other factors like the voice transaction record, incremental maturity associated with rigorous voice biometric research, the gap with other modalities is expected to narrow.
Hirotaka Nakasone if the FBI Senior Scientist, Voice Recognition Program.