Folksonomy and google bombs

Folksonomy or social tagging

A folksonomy is a system that consists of 2 elements:

The classic examples are of course and Flickr.

Some people regard folksonomies as just one of those web-crazes, a fashion for a while and afterwards irrelevant. I would like to argue that we have been using folksonomy-like logic for a long time already, and probably will continue to do so for even longer. Two important examples:

Folksonomy already powers contextual ads

Think about how Google Adwords/Adsense works: an advertiser bids on several keywords (and combinations). When one of these keywords is featured on a web page where Adsense ads are displayed, the advertiser’s message might show up (depending on how much he is prepared to offer). Well, a keyword is a tag, is a category! These tags may not be added explicitly but Google/Yahoo/MSN use technology to extract relevant keywords from the page’s content. The whole contextual market is driven by folksonomy-like tagging: the buyers (advertisers) use explicit tags, the sellers (websites) have the keywords extracted from the content and the mapping between the two is done by overlapping both sets.

How does search work? People use 1 or more words to describe what they are looking for and the search engine tries to come up with the most relevant web pages for the search terms. Some folks prefer the natural language way of asking questions (e.g. “how can I create a podcast with Blogger?”) but most seasoned users would probably search for “podcast blogger”. As with the previous explanation, the buyers (of information) provide explicit tags or keywords, and the web pages have their content processed and reduced to a set of keywords. A search term is a tag! Mapping between supply and demand is done with keywords or tags.

Fractional tags or tag relevancy

To extend the analogy a bit further, one could say that search engines create ‘fractional’ tags. What do I mean with that? In the, Flickr or Technorati tag universe, the content providers choose the tags of an item explicitly. A word is either a tag (100% relevant), or it is not a tag (0% relevant), there is no in between. On the other hand, search engines crawl and index web pages, and try to guess which words are most representative or relevant for that page. If the word “context” appears in the <title> or <h1> of a page, it is more relevant than it would be if it would be hidden in a text of 1500 words. So you could say that depending on a number of algorithms, a keyword can be a 5% or 99% relevant tag for a certain page.

Google bombs are folksonomy at work

Even if the word does not appear in the page, it can be a relevant tag (as is proven with Google bombs). The way Google works, every time a link appears in a web page, the link text (and the “title” attribute) becomes a fractional tag. If enough people use the same words, the fractional tags add up to establish an important relevancy for that combination of words and you get e.g. “miserable failure” pointing to the biography of George Bush.

My point is: eventhough social tagging services are less than 5 years old, the concept of categorisation (tagging/keyword extraction) has been instrumental for the Web for several years now. It is an intuitive way for organising/searching information. Not the only way, not the best way, but very crucial.
(Note to self: I have to rewrite my Sorted/Categorised/Indexed article at some point, it should be ‘classification’ for a taxonomy)


💬 Google 💬 Web2.0