Monthly Archive for December, 2005

Page 3 of 4

Google experiments with inline revisions

I don’t recall having seen this before: within the SERP (Search Engine Result Page) of a keyword X, Google puts the top 3 results for a keyword Y.
Google alternate results
The exact details:

  • I did a search for “FYD” on google.com (I won’t tell you why, but I think Ine might have an idea)
  • I got results for FYD (“Results 1 – 10 of about 280,000 for FYD”)
  • just behind the 3 first results, Google inserts a block with the 3 first results for “FTD” (4.150.000 results).
  • Google does not suggest that I made a typo by stating something like “Did you mean ‘FTD’?”
  • There is a title line “See results for: ftd” with the addition of oi=revisions_inline in the query string. So I guess they call it “inline revisions”.
  • the first three results for “FTD” are all homepages of domains with FTD in them: ftd.com, ftdi.com, ftd.de . That might be a coincidence, since these are the actual first 3 results for “ftd”.
  • Knowing how Google generally works, this seems like an organic search feature. I don’t think either of those 3 “ftd” domains paid for having the revision.
  • Google tracks click-through on these revisions: they first send the visitor to http://www.google.com/url?q=http://www.ftd.com/ with the parameters: sa=X & revid=889895241 (changes with every query refresh) & qpos=0 & upos=0 (position of result: 0/1/2) & oi=revisions_inline
  • I don’t get these revisions when I search on google.be. I don’t get them when I use an extra keyword in my search. But I can reproduce the results for “FYD” from another location.
  • Some other queries that have these inline revisions: PDZ (See results for: perfect dark zero), ADZ (See results for: adze), UGE (See results for: universal game editor).
    So it’s not just about typos, also for ‘lesser known’ acronyms.

UPDATE: penalty points for doing sloppy research: Google was already testing this out in August. Here’s an article on seo-consulting.de (German) (the search term was ‘COLA‘, I also get the revisions for that) and one on fuzzyfreaky.typepad.com and it was mentioned again in October on forums.digitalpoint.com. The feature was announced on August 19 by Googler Matt Cutts. Scusi.

Technorati:

The MPA and other people’s money

Like the RIAA, the MPA has the logical reaction to disruptive forces: send out the lawyers.

Suing file-sharers is apparently so 18-months-ago that the music industry, in dire need of something new to justify their hefty legal retainers, has taken aim at sites that offer �unauthorized� lyrics and unlicensed song scores. The Music Publishers� Association (MPA), which represents US sheet music companies, said it will launch its first campaign against such sites in 2006. MPA president Lauren Keiser told the BBC that shuttering websites and imposing fines aren�t quite sufficient, saying if authorities can �throw in some jail time I think we�ll be a little more effective.� Ho, ho, ho.
from google.weblogsinc.com

The main issue here seems to be that because of music lovers exchanging/downloading lyrics from websites, there is no market anymore for selling books with lyrics, a market that had already suffered under the evil influence of the Xerox photocopy. I have two words for that: buggy wips!

Lawrence Garfield: You know, at one time there must’ve been dozens of companies making buggy whips. And I’ll bet the last company around was the one that made the best goddamn buggy whip you ever saw. Now how would you have liked to have been a stockholder in that company? You invested in a business and this business is dead. Lets have the intelligence, lets have the DECENCY to sign the death certificate, collect the insurance, and invest in something with a future.
from Other People’s Money, by Danny DeVito

Again we have to look at the Electronic Frontier Foundation to talk some sense into the big guys with piles of money but no clue how to adapt.

Technorati: -

RFM for RSS feeds: Recency, Frequency, Momentary Value

I’ve been throwing round an idea in my head for a while: how the RFM method for analyzing and prediction customer behaviour could be applied to RSS feeds (blogs, podcasts, …).

Recency, Frequency, Monetary Value – customer segmentation

What does RFM do: it analyses 3 parameters for each customer:

  • date of last purchase (recency)
  • # purchases per month/quarter (frequency)
  • average amount of money spent per purchase (monetary value)

It then does a cluster analysis of the numbers (or in the simple version: a marketing guy decides based on gut feeling) and defines boundaries for each parameter, in order to split them up into categories.

Example:
Recency: R1 is everyone who purchased in the last 2 months, R2 is everyone who bought in the last year and R3 is the rest.
Frequency: F1 is every customer that purchased on average 3 or more times per quarter, F2 purchased at least 1 time per quarter and F3 is the rest.
Monetary Value: M1 are those who purchased more than �500 per visit and M2 are the rest.

In this scenario you have split up your heterogeneous customer group into 18 (3x3x2) more or less homogeneous subgroups that you can address in different ways. Your supercustomers R1-F1-M1 don’t need the same approach as the R3-F2-M1 (the big spenders that haven’t been around to your shop in the last year). And you hope you can predict the behaviour of each customer by analyzing his past behaviour.
(Side note: I learned this stuff while working in Sopres for Stefaan Vermeiren, who’s now teaching the Kiwis to do online banking)

RFM for RSS – feed segmentation

 

Now how would this work for RSS feeds?
RFM analysis for RSS feeds

  • Recency: date of last post
  • Frequency: average # posts per month, or mean-time-between-posts (important is that you only take into account the period from the first to the last post: if the feed contains 1 item per week but the last one was 1 year ago, the frequency is still 1/week i.e. around 4/month)
  • Momentary Value: (I know ‘momentary’ is not a great term, just couldn’t come up with a better 4-syllable alternative for ‘monetary’ yet) this is the most creative part: you can count the # of words, # of links, # images, filesize of the podcast audio or the video file, …

What can you do with this kind of statistic? Well, I see some applications:

  • is a blog ‘alive’? when do you decide if a blog is no longer active: it will be a combination of recency and frequency. If someone posted 1/week and there has not been any activity for 2 months: probably (momentary) dead. If someone posts 1/quarter and no activity for 2 months: perfectly normal. In statistic terms: calculate mean-time-between-posts MTBP and standard deviation STDEV. If the last post was MTBP days ago, there is a 50% chance that the feed is no longer updated. If it is (MTBP + STDEV) days ago, then the chance is 84%. (MTBP + 2 * STDEV): 97%, etc …
  • what kind of blog is it? if average # words/post is low, and # links per post is around 1 (and frequency is 1/day): it’s probably a linkblog (like e.g. bnox). If the #words/post is high, the MTBP is 1 month with a very low STDEV, it is probably a monthly newsletter.
  • do I have time for this blog? Now you subscribe to a blog without an idea of how often the author posts, and how long the articles are. With an RFM analysis, the blog could be marked as ‘low traffic’ (2 posts of 500 words per month) or ‘high maintenance’ (60 posts of 300 words per month).
  • how much data does this podcast deliver? There is a big difference between a show like DailySourcecode (about 20 podcasts of 40MB per month: 800MB/month) or IT Conversations (2 posts/day of 14 MB each: 840MB/month) and a humble effort like my Mash-up podcast (2 to 5 posts per year of 4,5MB: 1,5 MB/month). For a mobile device, where storage and bandwidth aren’t so readily available (nor cheap), this is an important distinction.

This RFM analysis could be done by a company like Technorati, Bloglines or Feedburner, and they could combine it with language, location, topic and popularity stats to create an excellent segmentation of blogs. Or if someone feels tempted to set it up?

Two days at LesBlogs Paris

What I saw

I just spent 2 days at the LesBlogs conference in Paris, organised by Loïc Lemeur/SixApart Europe. A gathering of geeks, entrepreneurs, venture capitalists and web architects, focused on “Blogs 2.0″. Days filled with speeches, panels, chats, networking lunch, dinner and drinks, sometimes dull, sometimes highly amusing and generally quite interesting.
Why not throw everything into a meme-map (Web 2.0 style):
LesBlogs Paris Meme-Map

What I liked

What speakers made the biggest impression:

  1. Ben Hammersley: one of the rare presentations (as opposed to panels) but undoubtedly the best. Thought-provoking (title of the speech: “Eight ideas that will really revolutionize the 21st century (and why blogging isn’t one of them)“), hilarious (“To improve horses, you have to put a fast stallion and a fast mare together, they have to shag … it’s all very technical!“) and thoroughly energising (“Remember: we are all first-row witnesses of a new renaissance, the roommates of Leonardo Da Vinci“).
  2. David Sifry: apart from the admiration I have for his technical skills (his most recent start-up is Technorati), he also had some pertinent things to say about ethical engineering, the responsibility of pioneers and how technology can change the world.
  3. Ethan Zuckerman: the uber-geek and social entrepreneur who started GeekCorps and Global Voices. A man with a mission. And long hair.
  4. Thomas Crampton: journalist with the International Herald Tribune, and guest blogger at Joi Ito’s site, with excellent British wit and some strong ideas on the role of journalism.
  5. Ben Metcalfe: a.k.a. ‘dotBen’, of BBC Backstage fame, for standing up and defending himself in a correct manner when he was attacked by Mena Trott.

And of course the networking was great: I met a lot of people while hunting for food and drinks. One particular nice chap was FactoryJoe, one of the Flockies. I was also very lucky to hang out with my fellow Belgians Francois and Denis (they would be the Shoobies), which made for interesting conversations, encounters and lots of background stories.

What I disliked

  • food: it’s a detail, I know, but both ‘networking’ lunches consisted entirely of tiny, one-bite, probably way overpriced toasts. It was a common grudge during the lunch and on the backchannel. Sandwiches would have been a great idea.
  • panel: in some cases the ‘panel’ formula was useful, but compare that to the Ben Hammersley slideshow … I would have liked more thought-provoking and futuristic presentations than just some light chatter. What’s happening in Scandinavia? What’s happening in Japan? Some new applications for podcasting and videoblogging? Brand new mashups of RSS and some other acronyms? Not everyone is a Hammersley, of course, as the lady from Edelman adequately demonstrated.
  • backchannel: a back-channel is like a bar: people grab a pint around the counter and start chatting away. The person who invited everyone has no control over the direction of the conversation. People at the bar behave differently as they would if they were conscious that the whole scene was being filmed and shown to an audience simultaneously. My opinion: 1) having a backchannel is good, both for disruptivity and networking, 2) the language in a backchannel will be different from the language on stage, on a blog or in a 1-to-1 conversation. 3) showing the backchannel on screen is not always a good idea: it makes it harder to concentrate and the tone of voice might be out of place.
  • not enough time: there’s plenty of people I would have liked to talk to, but the occasion didn’t present itself: Photomatt, Kevin Marks, dltq, Steve Olechowsky, Marc Canter, David Sifry, Mark Fletcher, Ewan McIntosh, Anina(:-), …

Technorati: