New search engine making strides in exploring the Deep, Dark Web
- By Kevin McCaney
- Feb 12, 2015
The Pentagon’s project to develop a new search engine to sift through information and find connections in the deepest corners of the Web, which kicked off only a year ago, has advanced far enough that it has contributed to a kidnaping and sexual assault conviction in New York City and, last weekend, been demonstrated on national TV.
The goal of the project, called Memex—a combination of “memory” and “index”—is to be able to reach into the vast terrain—90 percent to 95 percent—of the Internet not covered by commercial search engines. And in addition to search, the project is developing tools for quickly creating subsets of information and analyzing those subsets for potential links. The Defense Advanced Research Projects Agency, which is leading a team of 17 contractors on the research, discussed on 60 Minutes the value of being able to access that otherwise undiscovered country.
DARPA refers to that unindexed terrain as the Deep Web, which is often called by other names, most commonly the Dark Web or the Shadow Web. Although the terms are sometimes used interchangeably, Web intelligence companies caution that they don’t really mean the same thing. BrightPlanet, which says it coined “Deep Web” a dozen years ago, said that term refers to databases of non-indexed information that Google, Bing and Yahoo don’t get to. The Dark, or Shadow, Web is the Barbary Coast of the Internet, a subset of the Deep Web where nefarious operations like the former Silk Road drug dealership or human trafficking are conducted in domains such as the TOR network’s Hidden Services.
The Memex program would explore both, though DARPA did say in announcing the program that the initial focus would be to help law enforcement agencies investigating human trafficking.
In the New York case, a 28-year-old woman was held captive for two days in November 2012 and sexually abused by a group of men before she jumped from a sixth-floor window to escape, according to a report in Scientific American. Four months ago, New York County prosecutors convicted one of the men, and although both prosecutors and DARPA officials were mum publicly on Menex’ involvement, they confirmed to Scientific American that the search engine, at the time still nascent, was key to the conviction.
On the record, Manhattan District Attorney Cyrus R. Vance, Jr. did say his office is using Memex in every one of its human trafficking cases.
Search engines like Google, Bing and Yahoo cover, by some estimates, as little as 5 percent of what’s on the Internet, Chris White, the program manager for Memex, told 60 minutes. DARPA wants to get past those commercial results, which are based (in Google’s case) on advertising and rankings as determined by Google’s algorithms. Google is awfully good at what it does, but its searches nevertheless are finding sites and pages that want to be found. Memex would look for, sort and analyze data on all those other pages—temporary documents, data located behind forms, shared and other content—and be able to return to and build on searches, finding connections as they crop up.
“The main issue we’re trying to address is the one-size-fits-all approach to the internet where [search results are] based on consumer advertising and ranking,” White said.
Beyond going after traffickers, terrorists and other high-profile targets, a tool like Memex could help military, intelligence, law enforcement, health agencies and commercial interests find publicly available, mission-critical data—following an outbreak of Ebola, for example, or political upheavals or natural disasters.
When it announced the program, DARPA said it was interested only in publicly available information and was “specifically not interested” in identifying anonymous services, servers or IP addresses. The goal is just to make connections using the information out there, especially the information that is otherwise hard to find.
Kevin McCaney is a former editor of Defense Systems and GCN.