Foreign Language Translation for the IC Gets a Machine Learning Boost from IARPA

October 23, 2021

Some of the hottest, trending languages are Kazakh, Swahili and Pashto. Well, at least for the U.S. Intelligence Community (IC).

It’s probably safe to say that no organization is more interested in what foreign nationals are saying and writing than the IC. This is especially true for what’s being said in widely spoken languages of U.S. adversaries, like China and Russia. However, it’s also the case for “low resource” languages that are spoken by much smaller populations around the globe, like Kazakh, Swahili and Pashto.

The perennial challenge the IC has faced is how to quickly and accurately interpret those lesser-used languages or any language.

Using human beings to translate the quadrillions of words written and spoken by people around the world every day would be an incredibly time intensive and expensive endeavor. Fortunately, with its Machine Translation for English Retrieval of Information in Any Language (MATERIAL) program, IARPA is revolutionizing the way the IC consumes foreign language information.

By using machine learning to turn multilingual text and speech media into useable intelligence information for analysts, regardless of their language expertise, the need for human translation is substantially waning.

“The MATERIAL program has really altered the landscape by making it possible for anyone to efficiently find information in low resource languages,” said MATERIAL Program Manager Dr. Carl Rubino. “This is a game-changer for the IC, revolutionizing the way we access important foreign language data.”

Launched in October 2017, MATERIAL program performers, including Johns Hopkins University, Raytheon BBN Technologies, Columbia University and the University of Southern California Information Sciences Institute, were charged with building robust, automated language capabilities over a four-year period. MATERIAL’s ultimate goal was to build Cross-Language Information Retrieval (CLIR) systems that would find speech and text content in diverse lower-resource languages, using only English search queries, and succinctly relay the retrieved relevant foreign language information in English. Performers exceeded expectations and have successfully done just that.

In addition to Kazakh, Swahili and Pashto, the CLIR systems performers developed include state-of-the-art automatic speech recognition and machine translation systems and models for other languages such as Tagalog, Somali, Lithuanian, Georgian, Bulgarian and Farsi.

MATERIAL technologies were recently deployed in SCALE 2021, a multinational Summer Workshop at Johns Hopkins University that is devoted to exploring topics in human language technology. This summer’s topic was Cross-Language Information Retrieval. Using lessons learned and baseline models from the program, SCALE scientists were able to develop customized CLIR capabilities for Chinese, Russian and Farsi.

“I’m thrilled this technology is taking root,” Dr. Rubino said. “With continued IC investment and championship, this relatively novel approach for data discovery should soon be a standard and reliable tool for our analysts.”

Read the announcement at IARPA

Homeland Security Today

The Government Technology & Services Coalition's Homeland Security Today (HSToday) is the premier news and information resource for the homeland security community, dedicated to elevating the discussions and insights that can support a safe and secure nation. A non-profit magazine and media platform, HSToday provides readers with the whole story, placing facts and comments in context to inform debate and drive realistic solutions to some of the nation’s most vexing security challenges.

See Full Bio

Sean Connelly, Federal Cybersecurity Leader, Departs CISA After Over a Decade of Service

GAO: DHS Streamlines Internal Collaboration to Enhance Information Sharing and Threat Response

DHS ICE Awards $14M Stratcom Contract to MetaPhase

GreyScan’s ETD-100 Selected for TSA Innovation Task Force Demo

DHS Launches “Know2Protect” Campaign to Combat Online Child Exploitation

GAO: DHS Streamlines Internal Collaboration to Enhance Information Sharing and Threat Response

DHS Launches $18 Million Grant Program to Combat Targeted Violence and Terrorism

COLUMN: Foresight Finds: Immigration and Customs Enforcement (ICE) Edition

Foreign Language Translation for the IC Gets a Machine Learning Boost from IARPA

Related Articles

Biden-Harris Administration Offers $295 Million to Boost Community Resilience Against Climate Change

Destroying Houthi and Iranian Missiles Has Cost US $1 Billion

Coast Guard Repatriates 16 Migrants to Cuba

LEAVE A REPLY Cancel reply

Latest Articles

Biden-Harris Administration Offers $295 Million to Boost Community Resilience Against Climate Change

Destroying Houthi and Iranian Missiles Has Cost US $1 Billion

Coast Guard Repatriates 16 Migrants to Cuba

U.S. Navy Played Key Role in Protecting Israel From Iranian Missiles

Coast Guard Medevacs Passenger from Disney Cruise Ship Near Puerto Rico