A single medium, with a single search engine, & a single info source

From Nicholas Carr’s “All hail the information triumvirate!” (Rough Type: 22 January 2009):

Today, another year having passed, I did the searches [on Google] again. And guess what:

World War II: #1
Israel: #1
George Washington: #1
Genome: #1
Agriculture: #1
Herman Melville: #1
Internet: #1
Magna Carta: #1
Evolution: #1
Epilepsy: #1

Yes, it’s a clean sweep for Wikipedia.

The first thing to be said is: Congratulations, Wikipedians. You rule. Seriously, it’s a remarkable achievement. Who would have thought that a rag-tag band of anonymous volunteers could achieve what amounts to hegemony over the results of the most popular search engine, at least when it comes to searches for common topics.

The next thing to be said is: what we seem to have here is evidence of a fundamental failure of the Web as an information-delivery service. Three things have happened, in a blink of history’s eye: (1) a single medium, the Web, has come to dominate the storage and supply of information, (2) a single search engine, Google, has come to dominate the navigation of that medium, and (3) a single information source, Wikipedia, has come to dominate the results served up by that search engine. Even if you adore the Web, Google, and Wikipedia – and I admit there’s much to adore – you have to wonder if the transformation of the Net from a radically heterogeneous information source to a radically homogeneous one is a good thing. Is culture best served by an information triumvirate?

It’s hard to imagine that Wikipedia articles are actually the very best source of information for all of the many thousands of topics on which they now appear as the top Google search result. What’s much more likely is that the Web, through its links, and Google, through its search algorithms, have inadvertently set into motion a very strong feedback loop that amplifies popularity and, in the end, leads us all, lemminglike, down the same well-trod path – the path of least resistance. You might call this the triumph of the wisdom of the crowd. I would suggest that it would be more accurately described as the triumph of the wisdom of the mob. The former sounds benign; the latter, less so.

Anonymity and Netflix

From Bruce Schneier’s “Anonymity and the Netflix Dataset” (Crypto-Gram: 15 January 2008):

The point of the research was to demonstrate how little information is required to de-anonymize information in the Netflix dataset.

What the University of Texas researchers demonstrate is that this process isn’t hard, and doesn’t require a lot of data. It turns out that if you eliminate the top 100 movies everyone watches, our movie-watching habits are all pretty individual. This would certainly hold true for our book reading habits, our internet shopping habits, our telephone habits and our web searching habits.

Other research reaches the same conclusion. Using public anonymous data from the 1990 census, Latanya Sweeney found that 87 percent of the population in the United States, 216 million of 248 million, could likely be uniquely identified by their five-digit ZIP code, combined with their gender and date of birth. About half of the U.S. population is likely identifiable by gender, date of birth and the city, town or municipality in which the person resides. Expanding the geographic scope to an entire county reduces that to a still-significant 18 percent. “In general,” the researchers wrote, “few characteristics are needed to uniquely identify a person.”

Stanford University researchers reported similar results using 2000 census data. It turns out that date of birth, which (unlike birthday month and day alone) sorts people into thousands of different buckets, is incredibly valuable in disambiguating people.

Remote fingerprinting of devices connected to the Net

Anonymous Internet access is now a thing of the past. A doctoral student at the University of California has conclusively fingerprinted computer hardware remotely, allowing it to be tracked wherever it is on the Internet.

In a paper on his research, primary author and Ph.D. student Tadayoshi Kohno said: “There are now a number of powerful techniques for remote operating system fingerprinting, that is, remotely determining the operating systems of devices on the Internet. We push this idea further and introduce the notion of remote physical device fingerprinting … without the fingerprinted device’s known cooperation.”

The potential applications for Kohno’s technique are impressive. For example, “tracking, with some probability, a physical device as it connects to the Internet from different access points, counting the number of devices behind a NAT even when the devices use constant or random IP identifications, remotely probing a block of addresses to determine if the addresses correspond to virtual hosts (for example, as part of a virtual honeynet), and unanonymising anonymised network traces.” …

Another application for Kohno’s technique is to “obtain information about whether two devices on the Internet, possibly shifted in time or IP addresses, are actually the same physical device.”

The technique works by “exploiting small, microscopic deviations in device hardware: clock skews.” In practice, Kohno’s paper says, his techniques “exploit the fact that most modern TCP stacks implement the TCP timestamps option from RFC 1323 whereby, for performance purposes, each party in a TCP flow includes information about its perception of time in each outgoing packet. A fingerprinter can use the information contained within the TCP headers to estimate a device’s clock skew and thereby fingerprint a physical device.”

Kohno goes on to say: ” Our techniques report consistent measurements when the measurer is thousands of miles, multiple hops, and tens of milliseconds away from the fingerprinted device, and when the fingerprinted device is connected to the Internet from different locations and via different access technologies. Further, one can apply our passive and semi-passive techniques when the fingerprinted device is behind a NAT or firewall.”

And the paper stresses that “For all our methods, we stress that the fingerprinter does not require any modification to or cooperation from the fingerprintee.” Kohno and his team tested their techniques on many operating systems, including Windows XP and 2000, Mac OS X Panther, Red Hat and Debian Linux, FreeBSD, OpenBSD and even Windows for Pocket PCs 2002.

Credit cards sold in the Underground

From David Kirkpatrick’s “The Net’s not-so-secret economy of crime” (Fortune: 15 May 2006):

Raze Software offers a product called CC2Bank 1.3, available in freeware form – if you like it, please pay for it. …

But CC2Bank’s purpose is the management of stolen credit cards. Release 1.3 enables you to type in any credit card number and learn the type of card, name of the issuing bank, the bank’s phone number and the country where the card was issued, among other info. …

Says Marc Gaffan, a marketer at RSA: “There’s an organized industry out there with defined roles and specialties. There are means of communications, rules of engagement, and even ethics. It’s a whole value chain of facilitating fraud, and only the last steps of the chain are actually dedicated to translating activity into money.”

This ecosystem of support for crime includes services and tools to make theft simpler, harder to detect, and more lucrative. …

… a site called It’s a members-only forum, for both verified and non-verified members. To verify a new member, the administrators of the site must do due diligence, for example by requiring the applicant to turn over a few credit card numbers to demonstrate that they work.

It’s an honorable exchange for dishonorable information. “I’m proud to be a vendor here,” writes one seller.

“Have a good carding day and good luck,” writes another seller …

These sleazeballs don’t just deal in card numbers, but also in so-called “CVV” numbers. That’s the Creditcard Validation Value – an extra three- or four-digit number on the front or back of a card that’s supposed to prove the user has physical possession of the card.

On you can buy CVVs for card numbers you already have, or you can buy card numbers with CVVs included. (That costs more, of course.)

“All CVV are guaranteed: fresh and valid,” writes one dealer, who charges $3 per CVV, or $20 for a card number with CVV and the user’s date of birth. “Meet me at ICQ: 264535650,” he writes, referring to the instant message service (owned by AOL) where he conducts business. …

Gaffan says these credit card numbers and data are almost never obtained by criminals as a result of legitimate online card use. More often the fraudsters get them through offline credit card number thefts in places like restaurants, when computer tapes are stolen or lost, or using “pharming” sites, which mimic a genuine bank site and dupe cardholders into entering precious private information. Another source of credit card data are the very common “phishing” scams, in which an e-mail that looks like it’s from a bank prompts someone to hand over personal data.

Also available on TalkCash is access to hijacked home broadband computers – many of them in the United States – which can be used to host various kinds of criminal exploits, including phishing e-mails and pharming sites.

Matching identities across databases, anonymously

From MIT Technology Review‘s’ “Blindfolding Big Brother, Sort of“:

In 1983, entrepreneur Jeff Jonas founded Systems Research and Development (SRD), a firm that provided software to identify people and determine who was in their circle of friends. In the early 1990s, the company moved to Las Vegas, where it worked on security software for casinos. Then, in January 2005, IBM acquired SRD and Jonas became chief scientist in the company’s Entity Analytic Solutions group.

His newest technology, which allows entities such as government agencies to match an individual found in one database to that same person in another database, is getting a lot of attention from governments, banks, health-care providers, and, of course, privacy advocates. Jonas claims that his technology is as good at protecting privacy as it as at finding important information. …

JJ: The technique that we have created allows the bank to anonymize its customer data. When I say “anonymize,” I mean it changes the name and address and date of birth, or whatever data they have about an identity, into a numeric value that is nonhuman readable and nonreversible. You can’t run the math backwards and compute from the anonymized value what the original input value was. …

Here’s the scenario: The government has a list of people we should never let into the country. It’s a secret. They don’t want people in other countries to know. And the government tends to not share this list with corporate America. Now, if you have a cruise line, you want to make sure you don’t have people getting on your boat who shouldn’t even be in the United States in the first place. Prior to the U.S. Patriot Act, the government couldn’t go and subpoena 100,000 records every day from every company. Usually, the government would have to go to a cruise line and have a subpoena for a record. Section 215 [of the Patriot Act] allows the government to go to a business entity and say, “We want all your records.” Now, the Fourth Amendment, which is “search and seizure,” has a legal test called “reasonable and particular.” Some might argue that if a government goes to a cruise line and says, “Give us all your data,” it is hard to envision that this would be reasonable and particular.

But what other solution do they have? There was no other solution. Our Anonymous Resolution technology would allow a government to take its secret list and anonymize it, allow a cruise line to anonymize their passenger list, and then when there’s a match it would tell the government: “record 123.” So they’d look it up and say, “My goodness, it’s Majed Moqed.” And it would tell them which record to subpoena from which organization. Now it’s back to reasonable and particular. ….

TR: How is this is based on earlier work you did for Las Vegas casinos?

JJ: The ability to figure out if two people are the same despite all the natural variability of how people express their identity is something we really got a good understanding of assisting the gaming industry. We also learned how people try to fabricate fake identities and how they try to evade systems. It was learning how to do that at high speed that opened the door to make this next thing possible. Had we not solved that in the 1990s, we would not have been able to conjure up a method to do anonymous resolution.