Monthly Archive for December, 2005

Page 2 of 2

Google experiments with inline revisions

I don’t recall having seen this before: within the SERP (Search Engine Result Page) of a keyword X, Google puts the top 3 results for a keyword Y.
Google alternate results
The exact details:

  • I did a search for “FYD” on google.com (I won’t tell you why, but I think Ine might have an idea)
  • I got results for FYD (”Results 1 – 10 of about 280,000 for FYD”)
  • just behind the 3 first results, Google inserts a block with the 3 first results for “FTD” (4.150.000 results).
  • Google does not suggest that I made a typo by stating something like “Did you mean ‘FTD’?”
  • There is a title line “See results for: ftd” with the addition of oi=revisions_inline in the query string. So I guess they call it “inline revisions”.
  • the first three results for “FTD” are all homepages of domains with FTD in them: ftd.com, ftdi.com, ftd.de . That might be a coincidence, since these are the actual first 3 results for “ftd”.
  • Knowing how Google generally works, this seems like an organic search feature. I don’t think either of those 3 “ftd” domains paid for having the revision.
  • Google tracks click-through on these revisions: they first send the visitor to http://www.google.com/url?q=http://www.ftd.com/ with the parameters: sa=X & revid=889895241 (changes with every query refresh) & qpos=0 & upos=0 (position of result: 0/1/2) & oi=revisions_inline
  • I don’t get these revisions when I search on google.be. I don’t get them when I use an extra keyword in my search. But I can reproduce the results for “FYD” from another location.
  • Some other queries that have these inline revisions: PDZ (See results for: perfect dark zero), ADZ (See results for: adze), UGE (See results for: universal game editor).
    So it’s not just about typos, also for ‘lesser known’ acronyms.

UPDATE: penalty points for doing sloppy research: Google was already testing this out in August. Here’s an article on seo-consulting.de (German) (the search term was ‘COLA‘, I also get the revisions for that) and one on fuzzyfreaky.typepad.com and it was mentioned again in October on forums.digitalpoint.com. The feature was announced on August 19 by Googler Matt Cutts. Scusi.

Technorati:

The MPA and other people’s money

Like the RIAA, the MPA has the logical reaction to disruptive forces: send out the lawyers.

Suing file-sharers is apparently so 18-months-ago that the music industry, in dire need of something new to justify their hefty legal retainers, has taken aim at sites that offer �unauthorized� lyrics and unlicensed song scores. The Music Publishers� Association (MPA), which represents US sheet music companies, said it will launch its first campaign against such sites in 2006. MPA president Lauren Keiser told the BBC that shuttering websites and imposing fines aren�t quite sufficient, saying if authorities can �throw in some jail time I think we�ll be a little more effective.� Ho, ho, ho.
from google.weblogsinc.com

The main issue here seems to be that because of music lovers exchanging/downloading lyrics from websites, there is no market anymore for selling books with lyrics, a market that had already suffered under the evil influence of the Xerox photocopy. I have two words for that: buggy wips!

Lawrence Garfield: You know, at one time there must’ve been dozens of companies making buggy whips. And I’ll bet the last company around was the one that made the best goddamn buggy whip you ever saw. Now how would you have liked to have been a stockholder in that company? You invested in a business and this business is dead. Lets have the intelligence, lets have the DECENCY to sign the death certificate, collect the insurance, and invest in something with a future.
from Other People’s Money, by Danny DeVito

Again we have to look at the Electronic Frontier Foundation to talk some sense into the big guys with piles of money but no clue how to adapt.

Technorati: -

RFM for RSS feeds: Recency, Frequency, Momentary Value

I’ve been throwing round an idea in my head for a while: how the RFM method for analyzing and prediction customer behaviour could be applied to RSS feeds (blogs, podcasts, …).

Recency, Frequency, Monetary Value – customer segmentation

What does RFM do: it analyses 3 parameters for each customer:

  • date of last purchase (recency)
  • # purchases per month/quarter (frequency)
  • average amount of money spent per purchase (monetary value)

It then does a cluster analysis of the numbers (or in the simple version: a marketing guy decides based on gut feeling) and defines boundaries for each parameter, in order to split them up into categories.

Example:
Recency: R1 is everyone who purchased in the last 2 months, R2 is everyone who bought in the last year and R3 is the rest.
Frequency: F1 is every customer that purchased on average 3 or more times per quarter, F2 purchased at least 1 time per quarter and F3 is the rest.
Monetary Value: M1 are those who purchased more than �500 per visit and M2 are the rest.

In this scenario you have split up your heterogeneous customer group into 18 (3×3x2) more or less homogeneous subgroups that you can address in different ways. Your supercustomers R1-F1-M1 don’t need the same approach as the R3-F2-M1 (the big spenders that haven’t been around to your shop in the last year). And you hope you can predict the behaviour of each customer by analyzing his past behaviour.
(Side note: I learned this stuff while working in Sopres for Stefaan Vermeiren, who’s now teaching the Kiwis to do online banking)

RFM for RSS – feed segmentation

 

Now how would this work for RSS feeds?
RFM analysis for RSS feeds

  • Recency: date of last post
  • Frequency: average # posts per month, or mean-time-between-posts (important is that you only take into account the period from the first to the last post: if the feed contains 1 item per week but the last one was 1 year ago, the frequency is still 1/week i.e. around 4/month)
  • Momentary Value: (I know ‘momentary’ is not a great term, just couldn’t come up with a better 4-syllable alternative for ‘monetary’ yet) this is the most creative part: you can count the # of words, # of links, # images, filesize of the podcast audio or the video file, …

What can you do with this kind of statistic? Well, I see some applications:

  • is a blog ‘alive’? when do you decide if a blog is no longer active: it will be a combination of recency and frequency. If someone posted 1/week and there has not been any activity for 2 months: probably (momentary) dead. If someone posts 1/quarter and no activity for 2 months: perfectly normal. In statistic terms: calculate mean-time-between-posts MTBP and standard deviation STDEV. If the last post was MTBP days ago, there is a 50% chance that the feed is no longer updated. If it is (MTBP + STDEV) days ago, then the chance is 84%. (MTBP + 2 * STDEV): 97%, etc …
  • what kind of blog is it? if average # words/post is low, and # links per post is around 1 (and frequency is 1/day): it’s probably a linkblog (like e.g. bnox). If the #words/post is high, the MTBP is 1 month with a very low STDEV, it is probably a monthly newsletter.
  • do I have time for this blog? Now you subscribe to a blog without an idea of how often the author posts, and how long the articles are. With an RFM analysis, the blog could be marked as ‘low traffic’ (2 posts of 500 words per month) or ‘high maintenance’ (60 posts of 300 words per month).
  • how much data does this podcast deliver? There is a big difference between a show like DailySourcecode (about 20 podcasts of 40MB per month: 800MB/month) or IT Conversations (2 posts/day of 14 MB each: 840MB/month) and a humble effort like my Mash-up podcast (2 to 5 posts per year of 4,5MB: 1,5 MB/month). For a mobile device, where storage and bandwidth aren’t so readily available (nor cheap), this is an important distinction.

This RFM analysis could be done by a company like Technorati, Bloglines or Feedburner, and they could combine it with language, location, topic and popularity stats to create an excellent segmentation of blogs. Or if someone feels tempted to set it up?

Two days at LesBlogs Paris

What I saw

I just spent 2 days at the LesBlogs conference in Paris, organised by Loïc Lemeur/SixApart Europe. A gathering of geeks, entrepreneurs, venture capitalists and web architects, focused on “Blogs 2.0″. Days filled with speeches, panels, chats, networking lunch, dinner and drinks, sometimes dull, sometimes highly amusing and generally quite interesting.
Why not throw everything into a meme-map (Web 2.0 style):
LesBlogs Paris Meme-Map

What I liked

What speakers made the biggest impression:

  1. Ben Hammersley: one of the rare presentations (as opposed to panels) but undoubtedly the best. Thought-provoking (title of the speech: “Eight ideas that will really revolutionize the 21st century (and why blogging isn’t one of them)“), hilarious (”To improve horses, you have to put a fast stallion and a fast mare together, they have to shag … it’s all very technical!“) and thoroughly energising (”Remember: we are all first-row witnesses of a new renaissance, the roommates of Leonardo Da Vinci“).
  2. David Sifry: apart from the admiration I have for his technical skills (his most recent start-up is Technorati), he also had some pertinent things to say about ethical engineering, the responsibility of pioneers and how technology can change the world.
  3. Ethan Zuckerman: the uber-geek and social entrepreneur who started GeekCorps and Global Voices. A man with a mission. And long hair.
  4. Thomas Crampton: journalist with the International Herald Tribune, and guest blogger at Joi Ito’s site, with excellent British wit and some strong ideas on the role of journalism.
  5. Ben Metcalfe: a.k.a. ‘dotBen’, of BBC Backstage fame, for standing up and defending himself in a correct manner when he was attacked by Mena Trott.

And of course the networking was great: I met a lot of people while hunting for food and drinks. One particular nice chap was FactoryJoe, one of the Flockies. I was also very lucky to hang out with my fellow Belgians Francois and Denis (they would be the Shoobies), which made for interesting conversations, encounters and lots of background stories.

What I disliked

  • food: it’s a detail, I know, but both ‘networking’ lunches consisted entirely of tiny, one-bite, probably way overpriced toasts. It was a common grudge during the lunch and on the backchannel. Sandwiches would have been a great idea.
  • panel: in some cases the ‘panel’ formula was useful, but compare that to the Ben Hammersley slideshow … I would have liked more thought-provoking and futuristic presentations than just some light chatter. What’s happening in Scandinavia? What’s happening in Japan? Some new applications for podcasting and videoblogging? Brand new mashups of RSS and some other acronyms? Not everyone is a Hammersley, of course, as the lady from Edelman adequately demonstrated.
  • backchannel: a back-channel is like a bar: people grab a pint around the counter and start chatting away. The person who invited everyone has no control over the direction of the conversation. People at the bar behave differently as they would if they were conscious that the whole scene was being filmed and shown to an audience simultaneously. My opinion: 1) having a backchannel is good, both for disruptivity and networking, 2) the language in a backchannel will be different from the language on stage, on a blog or in a 1-to-1 conversation. 3) showing the backchannel on screen is not always a good idea: it makes it harder to concentrate and the tone of voice might be out of place.
  • not enough time: there’s plenty of people I would have liked to talk to, but the occasion didn’t present itself: Photomatt, Kevin Marks, dltq, Steve Olechowsky, Marc Canter, David Sifry, Mark Fletcher, Ewan McIntosh, Anina(:-), …

Technorati:

Let’s get rid of podkeyword.com

Bad wake-up call: theregister.co.uk reports on Erik Marcus, a podcaster who has had his podcast feed hijacked by Podkeyword.com (no link, you know why). Why am I concerned? Guess under what name my Smoothpod Mashup podcast is registered in iTunes?

What is podcast hijacking

(…) it merely involves finding a target podcast, and creating a new unique URL for it on a website you control. You then point your URL to the RSS feed of the target podcast. Next, you do what it takes to make sure that as new podcast search engines come to market, the page each engine creates for your target podcast points to your URL instead of the podcast creator’s official URL. (Colette Vogele)
So years can go by and then the hijacker strikes: At some point, [the hijacker] can then spring out of the woodwork and demand payment from [the] target [podcaster].” The podcaster is “supremely vulnerable”, because the hijacker can at any moment change URL pointer to any other show of the hijacker’s desire and the target podcaster’s audience will “vanish.” (corante)

So someone provides a mirror service for your podcast feed, gets it registered with major podcast directories and search engines and can then choose whether to just mirror your feed, alter it (e.g. insert advertising), or replace it by whatever he feels like. That, in short, is the businessplan of podkeyword.

Who is behind podkeyword?

Some research shows that a George Lambert from Nashua (NH) is the owner of podkeyword.com (registered in Oct 2004). He also has Goldenware Travel Technologies (goldenware.com), providing airline timetable services (so he’s used to repackaging other people’s data). Another of his projects is cashcowmarketingplan.com, a spam-infested blog on “getting rich quick on the Internet”. This is worrying!

Let’s get rid of podkeyword

Here is what has to be done:

by every podcaster
check if you are affected: search for your podcast to see if it has been hijacked:
on iTunes: subscribe to your own feed because you won’t be able to see the actual feed URL unless you’re subcribed. If it’s a podkeyword URL, click the “Report a concern” button and tell Apple this is a wrong feed URL, give them the right one.
on Yahoo: search for it and if both your real feed and the podkeyword feed are present (I found 2 podkeyword feeds for my own podcast), give the hijacked ones a bad review (give it 1 star and write a review about the hijack)
by every podcast directory/search engine (iTunes, Yahoo, you listening?)
restore hijacked feeds (Remark: the following is NOT real code, just some pseudo code to clearly explain what should be done)

for $victimFeedURL in (*.podkeyword.com feeds){
# get the content of the feed
$victimFeedXML=getHTTP($victimFeedURL);
#every feed contains the URL of the homepage
$victimSiteURL=ParseRSS($victimFeedXML,"channel.link");
#get the HTML of the homepage
$victimSiteHTML=getHTTP($victimsiteURL);
# get the URL of the feed the author has specified
$victimRealFeed=parseHTML($victimSiteHTML,
"head.link('application/rss+xml')");
if($victimRealFeed  "" AND $victimRealFeed  $victimFeedURL){
#replace *.podkeyword.com by real feed URL
$victimFeedURL=$victimRealFeed;
}
}
by Feedburner (since they are an important podcast feed provider)
detect the feeds that are being queried by podkeyword (I don’t see them showing up as a separate UserAgent in my Feedburner stats, but Eric Lunt and his gang won’t have too much trouble finding them anyway) and (a) warn the feed owners that they might be hijacked, (b) offer the feed owners the option to include an extra post in their feed to alert their subscribers to switch to the real feed.

Doesn’t Feedburner do the same kind of thing?

Not at all. Feedburner also mirrors and alters RSS feeds but there are some really big differences:

  1. Feedburner is not evil. That is my opinion and I might be wrong, but I’ve had contact with Feedburner on several occasions and they seem to be a bunch of intelligent and down-to-earth geeks. Plus they have found a way to make money with Feedburner without taking advantage of people.
  2. Feedburner has added value: they convert a feed to a valid podcast feed, they can splice different feeds together, they provide essential stats.They deserve their place as a middleman.
  3. Most importantly: I have voluntarily chosen Feedburner as my service provider! I configured it myself and I added the Feedburner RSS feed link to my blog’s HTML template. On the other hand, I didn’t contact podkeyword, I have never asked them to do anything for me.

Technorati:

Municipal WiFi: requirements for success

Double Wifi: prototype
Municipal Wifi is gaining speed. Some of the efforts are institutional (Joy Ito joins the FON advisory board, networks are being installed in San Francisco and New Orleans) and some are grassroots (John is setting up a Wifi cloud in Rio …)

I’ve looked at the models and tools of providers like FON and WifiDog/OpenWRT (any Linux), and I’ve done some testing as a provider myself. We’re not there yet.

Wifi checklist

For grass-roots municipal WiFi to really take off, we need the following:

PROVIDER CLIENT
SECURITY
- separate VLANs for internal and external PCs,
- standard firewall profiles (e.g. allow web, mail; disallow audio streaming, BitTorrent)
- accountability: some kind of authentication
- protection from other (rogue) clients
- preferably some kind of VPN (no sniffing)
- indication of connection security
BANDWIDTH
- guaranteed personal bandwidth
- traffic shaping for each connection (e.g. each PC
- guaranteed minimum bandwidth
- clear info on what is allowed (BitTorrent or not)
CONVENIENCE
- cross-platform (e.g. not Linksys only)
- wizard install (Next-Next-Finish)
- outsourced authentication (like FON)
- uptime tracking and ‘customer’ feedback – to distinguish between live, working access points and dead ones (e.g. Plazes)
- single sign-on (same password everywhere)
- easy connect, log-on and surf away
- easy detection of ‘friendly’ access points
- global coverage
- map overview of all available points (like WifiDog/Plazes)

Technorati: