Archive for the 'spam' Category

Idea: hosted classification service

Yesterday evening I was watching “How to replace yourself with very small shell script” by Hilary Mason.

[youtube width="500" height="360"]http://www.youtube.com/watch?v=IoQ4tka1zNk[/youtube]

In short: she uses some scripts to process incoming mail and send outgoing reminders. The part that really interested me is the one where she uses classification, probably naive Bayes, to extract topics from the tweets of her friends.

That made me think about Paul Graham’s famous spam essay (2002), which boosted the development of Bayesian spam filters for email. A Bayesian spam filter will, in very broad terms, analyze the words in a message, compare them to words typically used in a ‘spam’ or ‘ham’ collection, and come back with either a binary classification (spam/ham) or a spamminess score. The first time I read that article must have been back in 2003 or 2004. I recall installing one of the early versions of POPFile, a spam filter written in Perl. It worked as a POP3 proxy and did a pretty good job. POP3 made sense, because at that time, the only spam we had was email spam. Now there’s blog spam, comment spam, trackback spam, Twitter spam …

But these are the cloud days, right? If you think about it, Akismet (WordPress) and Mollom (Drupal) offer cloud-based spam filtering. Before them, Postini (now part of Google) offered hosted spam filtering services for email. But would it be possible to offer a very generic web service-like document classification service? Imagine the service classifier.com:

  • you register and get your own subdomain at myapp.clasifier.com
  • you choose whether your service will return one of a number of classes (ham/spam or urgent/normal/ignore …) or a numerical score.
  • you choose a tokenizer: defines what words will be extracted from your input: e.g. you can ignore, include or reformat email headers, you can ignore or transform HTML code, …
  • you create a corpus per category, the service will tell you if you have enough input
  • you call the service with an HTTP POST with an API key and the new document content to be classified, and you get back (in JSON/XML) the result

Sounds like something Google would offer? Well, they do, in some way: http://code.google.com/apis/predict/ Now if someone would develop a nice and easy interface around it …

Facebook tricked me into my own spam FAIL

facebook spam

So I decided to let Facebook check my Gmail contact list to see if I had missed some contacts (people using aliases, etc …). After carefully selecting a couple of FB friends to invite (a buddy from the army, …), I clicked ‘Select’ and then ‘OK’ on the next screen that I supposed was a ‘Confirm’ window. I didn’t even read what was written on it. Some minutes later I saw emails starting to come in on different email aliases I had created in all my years of Internet activity. Apparently I allowed Facebook to send email messages to all Gmail contacts with email addresses that were not yet ‘known’ in Facebook. I have about 1500 addresses in my Gmail, let’s say some 500 already have a FB profile: so I just allowed Facebook to send out 1000 ‘unsollicited commercial emails’ or *spam* on my behalf. There is no way for me to know how many emails went out, nor to whom. I feel strongly embarrased, since I have been a strong opponent of spam for years, and since I have no idea who I have bothered with this bulk mail.

A company like Facebook probably has a whole team concentrated on user experience and workflow streamlining, so I can only assume that this strategy is by design. They probably have to keep the monthly exponential growth numbers so they use every opportunity to collect new email addresses. This is plain wrong. The default should be ‘opt in‘, not ‘opt out‘ (that is, select those you want to invite instead of unselect those you don’t wanto to invite).

So dear Christopher Cox and/or Chamath Palihapitiya at Facebook, while you will probably say that ‘but it is clearly written on the page that they’re about to send an invitation to (in my case, 1000??) contacts‘, you know that you are wrong on this one. You’re spamming. Big time, like real jerks. Since you’re probably not going to do anything about it, Google: any ideas?

http://www.google.com/support/forum/p/gmail/thread?tid=46004a5733eee4f0&hl=en

http://blogs.zdnet.com/social/?p=266

http://www.smartmobs.com/2007/09/02/facebook-friending-spam/

Twitter spammers: Clickbank/Keynetics affiliates

I’ve been experimenting with Twitter a couple of times, and one of the results, the FM Brussel Live playlist twitter bot, seems to be rather popular. I get a couple of subscriptions per day. But recently they’re almost all of the form [name of girl][number of 2 - 4 digits]. This is what they look like:

Twitter followers: suspicious lot

Continue reading ‘Twitter spammers: Clickbank/Keynetics affiliates’

Bob Flora is a spammer

Linkeroever spam

Dear Mr. Bob Flora,

you are probably a collaborator for the “Linkeroever” movie. I see you were a sound designer for “Dju!” by Daniel Lamberts, who’s a friend of mine. So you’re connected to the Belgian cinema scene. But might I point out that we have never talked or met in person. So when you send me an email like the one above, that is not only impolite, it is also spam.

First off: didn’t your mom teach you proper manners? You don’t address me, you don’t introduce yourself, you just start shouting “Check it, rate it, forward it!”. Do you think that exclamation mark is gonna convince me? Never heard of the word ‘please’?

Secondly: where did you get my email address? I sure never gave it to you. I’m gonna reply to you to request to be removed from your spamming list, and it would be a good idea to comply with that.

Finally: do you think you’re doing Linkeroever or Pieter Van Hees a favour with stunts like that? I’m not linking to the movie or the Youtube trailer, as you might notice. That’s because you pissed me off. Your marketing skills are severely underdeveloped. Do something about it, or stick to designing sound.

 Regards,

Peter