Memex: The next generation of deep-Web search?
Web search engines are a great way to find information quickly, and they’re always improving the quality of their results. Google “Winter Olympics” and you get 1.69 billion results in 0.29 seconds, along with the schedule for the day’s events in Sochi and the current medal standings right there on the results page.
But those results would be the same whether you’re a government employee monitoring security at the games or a third grader wondering what time ice skating will be on. And it would take anyone a long time to go through those 1.69 billion results, which, despite their number, don’t include information the search engine hasn’t indexed — what’s called the deep Web.
The Defense Advanced Research Projects Agency wants to change that, by developing a domain-specific search engine that would be useful for government projects or missions, expanding the reach of search capabilities, sorting, organizing and storing results according to the a specific interest. With the new program, called Memex, DARPA wants to create the next generation of search technology, giving agencies better access to mission-critical information, the agency said in announcing the program.
“We’re envisioning a new paradigm for search that would tailor indexed content, search results and interface tools to individual users and specific subject areas, and not the other way around,” DARPA program manager Chris White said. “By inventing better methods for interacting with and sharing information, we want to improve search for everybody and individualize access to information. Ease of use for non-programmers is essential.”
In a Broad Agency Announcement, DARPA said it is looking to surpass the current scope of indexed search results, which often can miss temporary pages, pages behind forms, shared content across pages and other types of content. The agency also wants to return to searches and build on them over time, rather than having each search be a single instance, as it is on commercial search engines.
DARPA said it envisions Memex to eventually be used for any public-domain content, but it will first be used to counter human trafficking, which DOD sees as an important mission. Human trafficking, which has a strong online element, plays into many military, intelligence and law enforcement investigations, DARPA said, and better search and analysis could help combat it.
Memex will include three technical areas:
1. Domain-Specific Indexing, which would include a scalable, adaptable Web crawling infrastructure that could include such features as natural language processing, image analysis, multimedia extraction and other features. It also should be resistant to counter-crawling measures, bot detection and other barriers to the extraction of information.
2. Domain-Specific Search, which would include a domain-specific interface that can be configured for a person, specific types of content, locations and entity movement and other factors. DARPA would also expect developers working in these first two areas to develop and query language capable of directing the searches.
3. Applications, which involve developing applications to support technical areas 1 and 2, starting with apps that support efforts to counter human trafficking.
DARPA will hold a proposer’s day Feb. 18, 2014, in Arlington, Va. Registration closes on Feb. 13 at 5 p.m.
Kevin McCaney is a former editor of Defense Systems and GCN.