“Interestingness” for Google web search

Interesting Flickr

Last year, when Flickr wanted to create a ranking system for its pictures, they developed an algorithm for “interestingness”.

Sound of cause
Flickr photo by tkproject2004

There are lots of things that make a photo ‘interesting’ (or not) in the Flickr. Where the clickthroughs are coming from; who comments on it and when; who marks it as a favorite; its tags and many more things which are constantly changing. Interestingness changes over time, as more and more fantastic photos and stories are added to Flickr.
from About Interestingness

Taking into account views, comments, notes, favorites and user reputations, it is an advanced wisdom-of-the-crowds long-tail recommendation engine. The exact formula is unknown and the indivual ‘interestingness’ score of a photo cannot be displayed. Just like with Google PageRank, people try to guess how it works internally.

I believe interestingness works by a combination of the following things:
1) (…) If there are two pictures which have the same number of favorites the one that has been less seen seems to be more interesting. (…)
2) Favorites seem to have more weight than comments
3) (…) a person who is known by the system to create interesting content is given greater power to judge content as interesting.
comment by Alex Andronov on Flickr and Interestingness

Could Google do that too?

Where Google only takes into account links between pages – and one link is one ‘vote’ of a certain weight – Flickr uses multiple sources of interaction data. The reason is, obviously, because they have it. All ‘view’, ‘note’, ‘comment’ and ‘favorite’ votes go through their website. Google does not have this luxury. So where could they get more data from to build a PageInterest ranking?

  • VIEWS: how could Google know which pages are viewed?
    1. They could use the Alexa data (although, maybe Amazon won’t give it).
    2. They could use the data coming in from their own Google Toolbar (to show the Pagerank for each page, a query is sent to the Google Pagerank servers with the URL).
    3. They could use the data coming in from their Google Analytics clients (there aren’t so many of them, granted).
    4. They could use the Adsense impressions (a lot of sites have those).

    All these methods sound kind of ‘Big Brother’, don’t they?

  • COMMENTS/NOTES: how can Google know on which URLs people react?
    1. An obvious source are the blogs. A link from a blog could be considered a ‘comment’. Strictly speaking they already use this data to calculate Pagerank.
    2. They could look for URLs in Gmail. Which would incite spammers to start sending bogus mails between Gmail accounts. Bad idea.
    3. Count # of comments on blog posts? Nah. Authors would start spamming their own blog posts.
    4. Digg data? Although the focus of Digg is now still limited to technology, in the future maybe. Stumbleupon?

    So this turns out to be a hard part.

  • FAVOURITES: how could Google know what URLs are preferred – or stored for posteriority – by surfers?
    1. del.icio.us would be a great source of info, or ma.gnolia
    2. a bookmark backup/sync provider, like Google Bookmarks.

Looks like both Yahoo and Google have a lot of the necessary data sources to create an “interestingness” rating for web pages.

Interesting movies

Another place where such an index would come in handy is Youtube: they now have a “most recent”, “most viewed“, “top rated”, “most discussed” …, but if they could combine all those into a single “most interesting” listing, and also sort their search results on interestingness, that would a really neat feature.

3 thoughts on ““Interestingness” for Google web search”

  1. Google is already using some sort of interestingness. They take into account the number of clicks on a searchresult, and the time spent on the site.

    Data is coming from both the search engine pages and the Google toolbar.

  2. Some more ideas:

    Clicktroughs (as Bart already mentioned)

    The strange thing for me is that I can’t seem to discover a pattern as to when a search result is a clickthrough:
    peter+forret gives sometimes the naked http://www.forret.com/ , sometimes a click-through url like google.be/url?q=http://www.forret.com/(…)
    It is not unimaginable they even could demote click-throughs that are quickly followed by a second click-through when the user uses the back button quickly and goes for alternative results.

    Google account and search:

    If you use a service linked to a Google account, such as Gmail (running on the same http://www.google.com domain) you remain logged in for searches on http://www.google.com (a reason for me to switch to google.be or google.co.uk for search…). Click-through url’s show up here as well (regardless of whether you use personalised search http://www.google.com/psearch or not), and the link with the account makes it all the more interesting.

    Referer information:

    Google analytics definitely records it, not sure about Adsense here.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.