Thoughts on tagging/folksonomy

From Ulises Ali Mejias’ “A del.icio.us study: Bookmark, Classify and Share: A mini-ethnography of social practices in a distributed classification community“:

This principle of distribution is at work in socio-technical systems that allow users to collaboratively organize a shared set of resources by assigning classifiers, or tags, to each item. The practice is coming to be known as free tagging, open tagging, ethnoclassification, folksonomy, or faceted hierarchy (henceforth referred to in this study as distributed classification) …

One important feature of systems such as these is that they do not impose a rigid taxonomy. Instead, they allow users to assign whatever classifiers they choose. Although this might sound counter-productive to the ultimate goal of organizing content, in practice it seems to work rather well, although it does present some drawbacks. For example, most people will probably classify pictures of cats by using the tag ‘cats.’ But what happens when some individuals use ‘cat’ or ‘feline’ or ‘meowmeow’ …

It seems that while most people might not be motivated to contribute to a pre-established system of classification that may not meet their needs, or to devise new and complex taxonomies of their own, they are quite happy to use distributed systems of classification that are quick and able to accommodate their personal (and ever changing) systems of classification. …

But distributed classification does not accrue benefits only to the individual. It is a very social endeavor in which the community as a whole can benefit. Jon Udell describes some of the individual and social possibilities of this method of classification:

These systems offer lots of ways to visualize and refine the tag space. It’s easy to know whether a tag you’ve used is unique or, conversely, popular. It’s easy to rename a tag across a set of items. It’s easy to perform queries that combine tags. Armed with such powerful tools, people can collectively enrich shared data. (Udell 2004) …

Set this [an imposed taxonomy] against the idea of allowing a user to add tags to any given document in the corpus. Like Del.icio.us, there needn’t be a pre-defined hierarchy or lexicon of terms to use; one can simply lean on the power of ethnoclassification to build that lexicon dynamically. As such, it will dynamically evolve as usages change and shift, even as needs change and shift. (Williams, 2004)

The primary benefit of free tagging is that we know the classification makes sense to users… For a content creator who is uploading information into such a system, being able to freely list subjects, instead of choosing from a pre-approved “pick list,” makes tagging content much easier. This, in turn, makes it more likely that users will take time to classify their contributions. (Merholz, 2004)

Folksonomies work best when a number of users all describe the same piece of information. For instance, on del.icio.us, many people have bookmarked wikipedia (http://del.icio.us/url/bca8b85b54a7e6c01a1bcfaf15be1df5), each with a different set of words to describe it. Among the various tags used, del.icio.us shows that reference, wiki, and encyclopedia are the most popular. (Wikipedia entry for folksonomy, retrieved December 15, 2004 from http://en.wikipedia.org/wiki/Folksonomy)

Of course, this approach is not without its potential problems:

With no one controlling the vocabulary, users develop multiple terms for identical concepts. For example, if you want to find all references to New York City on Del.icio.us, you’ll have to look through “nyc,” “newyork,” and “newyorkcity.” You may also encounter the inverse problem — users employing the same term for disparate concepts. (Merholz, 2004) …

But as Clay Shirky remarks, this solution might diminish some of the benefits that we can derive from folksonomies:

Synonym control is not as wonderful as is often supposed, because synonyms often aren’t. Even closely related terms like movies, films, flicks, and cinema cannot be trivially collapsed into a single word without loss of meaning, and of social context … (Shirky, 2004) …

The choice of tags [in the entire del.icio.us system] follows something resembling the Zipf or power law curve often seen in web-related traffic. Just six tags (python, delicious/del.icio.us, programming, hacks, tools, and web) account for 80% of all the tags chosen, and a long tail of 58 other tags make up the remaining 20%, with most occurring just once or twice … In the del.icio.us community, the rich get richer and the poor stay poor via http://del.icio.us/popular. Links noted by enough users within a short space of time get listed here, and many del.icio.us users use it to keep up with the zeitgeist. (Biddulph, 2004) …