Big data really needs a 'big mechanism,' DARPA says
- By George Leopold
- Aug 06, 2014
A Defense Department research program seeks to leapfrog advanced big data analytics by developing automated search technologies that could help explain the causes and effects that drive complex systems.
The Defense Advanced Research Projects Agency launched its "big mechanism" initiative earlier this year with the goal of developing automated tools that could uncover causal models hidden in big data.
The classic example of a big mechanism is the 1854 map of London showing the association between a cholera outbreak and a polluted public water pump—a discovery that overturned the prevailing opinion that diseases travelled through the air. Now, the relentless waves of scientific data make it nearly impossible to bridge the gap between tracking associated data points and discovering the cause-and-effect mechanisms behind big data.
"Having big data about complicated economic, biological, neural and climate systems isn't the same as understanding the dense webs of causes and effects – what we call 'big mechanisms' – in these systems," DARPA Program Manager Paul Cohen said in launching the research effort in February.
"Unfortunately, what we know about big mechanisms is contained in enormous, fragmentary and sometimes contradictory literatures and databases, so no single human can understand a really complicated system in its entirety," Cohen added. "So computers must help us."
DARPA's Information Innovation Office released a preliminary request for proposals earlier this year to help develop technologies that could be used, for example, to scour research papers to extract details that could eventually be used to explain cause-and-effect relationships.
The DARPA office plans to initially use big mechanism tools to study the complex molecular interactions that cause cells to become cancerous. The proposed methodology includes using computers to scan research papers on cancer biology to extract data on cancer pathways. The data fragments could then be assembled into complete pathways of "unprecedented scale and accuracy," the agency claimed, to determine how pathways interact.
In the last step, automation tools could help determine causes and effects that could be manipulated to develop potential cancer treatments.
"The language of molecular biology and the cancer literature emphasizes mechanisms,” Cohen said. “Papers describe how proteins affect the expression of other proteins, and how these effects have biological consequences. Computers should be able to identify causes and effects in cancer biology papers."
More broadly, big mechanism tools could help understand complicated systems while aiding researchers struggling to keep up with a relentless stream of data generated by scientific journals. Researchers who are forced to specialize in narrow areas of science could use big mechanism tools to expand their perspective.
Under a proposed DARPA scheme, scientific journals would become part of a big mechanism database. "Every aspect of a big mechanism would be tied to the data that supports it or contradicts it," the agency said.
"By emphasizing causal models and explanation, big mechanism may be the future of science," Cohen asserted.