Tool kit would work for every language (all 7,000 of them)
- By Kevin McCaney
- Oct 13, 2015
U.S. forces arriving in foreign lands bring all kinds of communications equipment with them for sharing voice, video, imagery and other data. But that equipment allows them to talk to each other. In some circumstances, particularly in disaster relief and other emergency responses, effective operations also rely on talking to the locals. And that can be a problem.
Considering the global reach of the military and its willingness to respond to disasters, health crises and other incidents, it’s not uncommon that they arrive without the ability to speak the local language. Human translators aren’t always available, and computer translators don’t cover all of the world’s languages, especially those that are rare.
The Defense Advanced Research Projects Agency wants to get around the dilemma with a system that would find and interpret elements that rare languages—what it calls low-resource languages—have in common to offer a basic understanding. The Low Resource Languages for Emergent Incidents, or LORELEI, program isn’t looking to comprehensively translate those languages, DARPA said, but to “provide situational awareness by identifying and correlating elements of information in foreign-language and English sources.”
Comprehensive language translators are fine for common, widely spoken languages, but those programs can take years and millions of dollars to develop, an approach that isn’t practical for rare languages that are spoken by a relatively few number of people.
DARPA said there are more than 7,000 languages spoken around the world. The Linguistic Society of America, citing research done in 2009, put the number (at that time, anyway) at 6,909 distinct languages. It also noted how diverse some regions are. In Papua-New Guinea, for instance, an estimated 832 languages are spoken among a population of about 3.9 million people, so each language is spoken by an average of 4,500 people.
So DARPA isn’t looking to become fluent in all those languages but to be able to find enough common ground to let U.S. personnel coordinate with local organizations during humanitarian assistance, disaster relief, peacekeeping, infectious disease response and other missions.
The agency isn’t specifying which technologies are to go into LORELEI—automated speech recognition or machine translation, for instance, might or might nopt be part of the finished product. The idea is to be able to correlate information in low-resource foreign languages and English.
“Through LORELEI, we envision a system that could quickly pick out key information—things such as names, events, sentiment and relationships—from public news and social media sources in any language, based on the system’s understanding of other languages,” said Boyan Onyshkevych, DARPA program manager. “The goal is to provide immediate, evolving situational awareness that helps decision makers assess and respond as intelligently as possible to dynamic, difficult situations.”
The research agency has awarded Phase 1 contracts to 13 organizations:
- Carnegie Mellon University
- Columbia University
- Johns Hopkins University
- Next Century Corporation
- Raytheon BBN
- University of Illinois Urbana-Champaign
- University of Massachusetts
- University of Pennsylvania
- University of Pennsylvania Linguistic Data Consortium
- University of Texas El Paso
- University of Washington
- University Southern California Information Sciences Institute.
Phase 1 will focus on three principal areas:
Algorithm Research and Development Environment, which aims to reduce the reliance on large language libraries and instead focus on what languages have in common.
Run-time Framework Development, a prototype tool to combine open-source data feeds in English and other languages.
Linguistic Resource Creation, which would combine resources ranging from dictionaries to emergency response terminology in support of the first two technical areas.
DARPA is hoping that LORELEI will be able to present useful response information in an easy-to-use interface within 24 hours of an incident and deliver fully automated language capabilities within days or weeks.
Kevin McCaney is a former editor of Defense Systems and GCN.