http://www.wsj.com/articles/sleuthing-search-engine-even-better-than-google-1423703464?mod=WSJ_hp_RightTopStoriesn the run-up to Super Bowl XLIX, a team of social workers in Glendale, Ariz. spent two weeks combing through local classified ads sites. They were looking for listings posted by sex traffickers.
Criminal networks that exploit women often advertise on local sites around events that draw large numbers of transient visitors. “It’s like a flood,” said Dominique Roe-Sepowitz, who headed the Glendale effort.
Dr. Roe-Sepowitz is director of the Office of Sex Trafficking Intervention Research at Arizona State University. She has worked for five years with authorities in Houston, Las Vegas and Phoenix to find and hunt down traffickers.
In the past, she painstakingly copied and pasted suspicious URLs into a document and looked for patterns that suggested a trafficking ring. This year, she analyzed criminal networks using visual displays from a powerful data-mining tool, one whose capabilities hint at the future of investigations into online criminal networks.
The program, a tool called Memex developed by the U.S. military’s research and development arm, is a search engine on steroids. Rather than endless pages of Web links, it returns sophisticated infographics that represent the relationships between Web pages, including many that a Google search would miss.
Advertisement
For instance, searching the name and phone number that appear in a suspicious ad would result in a diagram that showed separate constellations of dots, representing links to ads that contain the name, the phone number, or both. Such results could suggest a ring in which the same phone number was associated with different women. Clicking on a dot can reveal the physical location of the device that posted the ad and the time it was posted. Another click, and it shows a map of the locations from which the ads were posted. Capabilities like this make it possible to identify criminal networks and understand their operations in powerful new ways.
Unlike a Google search, Memex can search not only for text but also for images and latitude/longitude coordinates encoded in photos. It can decipher numbers that are part of an image, including handwritten numbers in a photo, a technique traffickers often use to mask their contact information. It also recognizes photo backgrounds independently of their subjects, so it can identify pictures of different women that share the same backdrop, such as a hotel room—a telltale sign of sex trafficking, experts say.
Also unlike Google, it can look into, and spot relationships among, not only run-of-the-mill Web pages but online databases such as those offered by government agencies and within online forums (the so-called deep Web) and networks like Tor, whose server addresses are obscured (the so-called dark Web).
Since its release a year ago, Memex has had notable successes in sex-trafficking investigations. New York County District Attorney Cyrus Vance said Memex has generated leads in 20 investigations and has been used in eight trials prosecuted by the county’s sex-trafficking division. In a case last June, Mr. Vance said, Memex’s ability to search the posting times of ads that had been taken down helped in a case that resulted in the sentencing of a trafficker to 50 years to life in prison.
The creator of Memex is Christopher White, a Harvard-trained electrical engineer who runs big-data projects for the Defense Advanced Research Projects Agency, or Darpa. The Defense Department’s center of forward-looking research and development, Darpa put between $10 million and $20 million into building Memex. (The precise amount isn’t disclosed.) Although the tool can be used in any Web-based investigation, Dr. White started with the sex trade because the Defense Department believed its proceeds finance other illegal activities.
Memex is part of a wave of software tools that visualize and organize the rising tide of online information. Unlike many other tools, though, it is free of charge for those who want to download, distribute and modify. Dr. White said he wanted Memex to be free “because taxpayers are paying for it.” Federal agencies have more money to spend, but local law-enforcement agencies often can’t afford the most sophisticated tools, even as more criminal activity moves online.
ENLARGE
Among tools used by law-enforcement agencies, Memex would compete with software from Giant Oak, Decision Lens and Centrifuge Systems. The leader in the field is Palantir Technologies, whose software costs $10 million to $100 million per installation and draws from the user’s proprietary databases rather than from the Web. Palantir didn’t immediately reply to a request for comment.
Advertisements posted by sex traffickers amount to between $90,000 and $500,000 daily in total revenue to a variety of outlets, according to Darpa.
Memex and similar tools raise serious questions about privacy. Marc Rotenberg, president and executive director of the Electronic Privacy Information Center in Washington, D.C., said, that when law-enforcement authorities start using powerful data-mining software, “the question that moves in the background is how much of this is actually lawful.” Data-visualization tools like Memex enable enforcers to combine vast amounts of public and private information, but the implications haven’t been fully examined, he said.
Dr. White said he drew a “bright line” around online privacy, designing Memex to index only publicly available information. In anonymous networks like Tor, which hosts many sex ads, Memex finds only the public pages. But since the tool isn't technically controlled by Darpa, independent developers could add capabilities that would make it more invasive, he acknowledged.
Another big question is whether sex traffickers and other malefactors will thwart Memex by changing their tactics. For example, they might blur out photo backgrounds if they knew law enforcement officials were searching for them. For this reason, law-enforcement users will withhold some of the proprietary data they developed while using Memex. “We want it to be free,” said Dr. White. “But there’s always this tension between knowing what people are doing…and alerting them to that fact so they change their behavior.”
Dr. White is starting to test other uses for Memex with law enforcement and government partners, he said, including recognizing connections between shell companies, following the chains of recruitment for foreign fighters drawn to the terrorist group ISIS, mapping the spread of epidemics, and following ads for labor and goods to understand supply chains involved in money laundering.
Write to Elizabeth Dwoskin at elizabeth.dwoskin@wsj.com