“Interestingness” for Google web search
17 Aug 2006Interesting Flickr
Last year, when Flickr wanted to create a ranking system for its pictures, they developed an algorithm for “interestingness”.
There are lots of things that make a photo ‘interesting’ (or not) in the Flickr. Where the clickthroughs are coming from; who comments on it and when; who marks it as a favorite; its tags and many more things which are constantly changing. Interestingness changes over time, as more and more fantastic photos and stories are added to Flickr.
from About Interestingness
Taking into account views, comments, notes, favorites and user reputations, it is an advanced wisdom-of-the-crowds long-tail recommendation engine. The exact formula is unknown and the indivual ‘interestingness’ score of a photo cannot be displayed. Just like with Google PageRank, people try to guess how it works internally.
I believe interestingness works by a combination of the following things:
1) (…) If there are two pictures which have the same number of favorites the one that has been less seen seems to be more interesting. (…)
2) Favorites seem to have more weight than comments
3) (…) a person who is known by the system to create interesting content is given greater power to judge content as interesting.
comment by Alex Andronov on Flickr and Interestingness
Could Google do that too?
Where Google only takes into account links between pages – and one link is one ‘vote’ of a certain weight – Flickr uses multiple sources of interaction data. The reason is, obviously, because they have it. All ‘view’, ‘note’, ‘comment’ and ‘favorite’ votes go through their website. Google does not have this luxury. So where could they get more data from to build a PageInterest ranking?
- VIEWS: how could Google know which pages are viewed?
- They could use the Alexa data (although, maybe Amazon won’t give it).
- They could use the data coming in from their own Google Toolbar (to show the Pagerank for each page, a query is sent to the Google Pagerank servers with the URL).
- They could use the data coming in from their Google Analytics clients (there aren’t so many of them, granted).
- They could use the Adsense impressions (a lot of sites have those).
All these methods sound kind of ‘Big Brother’, don’t they?
- COMMENTS/NOTES: how can Google know on which URLs people react?
- An obvious source are the blogs. A link from a blog could be considered a ‘comment’. Strictly speaking they already use this data to calculate Pagerank.
- They could look for URLs in Gmail. Which would incite spammers to start sending bogus mails between Gmail accounts. Bad idea.
- Count # of comments on blog posts? Nah. Authors would start spamming their own blog posts.
- Digg data? Although the focus of Digg is now still limited to technology, in the future maybe. Stumbleupon?
So this turns out to be a hard part. </li>
- FAVOURITES: how could Google know what URLs are preferred – or stored for posteriority – by surfers?
- del.icio.us would be a great source of info, or ma.gnolia
- a bookmark backup/sync provider, like Google Bookmarks.</ul>
Looks like both Yahoo and Google have a lot of the necessary data sources to create an “interestingness” rating for web pages.
Interesting movies
Another place where such an index would come in handy is Youtube: they now have a “most recent”, “most viewed“, “top rated”, “most discussed” …, but if they could combine all those into a single “most interesting” listing, and also sort their search results on interestingness, that would a really neat feature.