Idea: hosted classification service

Yesterday evening I was watching “How to replace yourself with very small shell script” by Hilary Mason.

In short: she uses some scripts to process incoming mail and send outgoing reminders. The part that really interested me is the one where she uses classification, probably naive Bayes, to extract topics from the tweets of her friends.

That made me think about Paul Graham’s famous spam essay (2002), which boosted the development of Bayesian spam filters for email. A Bayesian spam filter will, in very broad terms, analyze the words in a message, compare them to words typically used in a ‘spam’ or ‘ham’ collection, and come back with either a binary classification (spam/ham) or a spamminess score. The first time I read that article must have been back in 2003 or 2004. I recall installing one of the early versions of POPFile, a spam filter written in Perl. It worked as a POP3 proxy and did a pretty good job. POP3 made sense, because at that time, the only spam we had was email spam. Now there’s blog spam, comment spam, trackback spam, Twitter spam …

But these are the cloud days, right? If you think about it, Akismet (WordPress) and Mollom (Drupal) offer cloud-based spam filtering. Before them, Postini (now part of Google) offered hosted spam filtering services for email. But would it be possible to offer a very generic web service-like document classification service? Imagine the service classifier.com:

  • you register and get your own subdomain at myapp.clasifier.com
  • you choose whether your service will return one of a number of classes (ham/spam or urgent/normal/ignore …) or a numerical score.
  • you choose a tokenizer: defines what words will be extracted from your input: e.g. you can ignore, include or reformat email headers, you can ignore or transform HTML code, …
  • you create a corpus per category, the service will tell you if you have enough input
  • you call the service with an HTTP POST with an API key and the new document content to be classified, and you get back (in JSON/XML) the result

Sounds like something Google would offer? Well, they do, in some way: http://code.google.com/apis/predict/ Now if someone would develop a nice and easy interface around it …

Facebook tricked me into my own spam FAIL

facebook spam

So I decided to let Facebook check my Gmail contact list to see if I had missed some contacts (people using aliases, etc …). After carefully selecting a couple of FB friends to invite (a buddy from the army, …), I clicked ‘Select’ and then ‘OK’ on the next screen that I supposed was a ‘Confirm’ window. I didn’t even read what was written on it. Some minutes later I saw emails starting to come in on different email aliases I had created in all my years of Internet activity. Apparently I allowed Facebook to send email messages to all Gmail contacts with email addresses that were not yet ‘known’ in Facebook. I have about 1500 addresses in my Gmail, let’s say some 500 already have a FB profile: so I just allowed Facebook to send out 1000 ‘unsollicited commercial emails’ or *spam* on my behalf. There is no way for me to know how many emails went out, nor to whom. I feel strongly embarrased, since I have been a strong opponent of spam for years, and since I have no idea who I have bothered with this bulk mail.

A company like Facebook probably has a whole team concentrated on user experience and workflow streamlining, so I can only assume that this strategy is by design. They probably have to keep the monthly exponential growth numbers so they use every opportunity to collect new email addresses. This is plain wrong. The default should be ‘opt in‘, not ‘opt out‘ (that is, select those you want to invite instead of unselect those you don’t wanto to invite).

So dear Christopher Cox and/or Chamath Palihapitiya at Facebook, while you will probably say that ‘but it is clearly written on the page that they’re about to send an invitation to (in my case, 1000??) contacts‘, you know that you are wrong on this one. You’re spamming. Big time, like real jerks. Since you’re probably not going to do anything about it, Google: any ideas?

http://www.google.com/support/forum/p/gmail/thread?tid=46004a5733eee4f0&hl=en

http://blogs.zdnet.com/social/?p=266

http://www.smartmobs.com/2007/09/02/facebook-friending-spam/

Twitter spammers: Clickbank/Keynetics affiliates

I’ve been experimenting with Twitter a couple of times, and one of the results, the FM Brussel Live playlist twitter bot, seems to be rather popular. I get a couple of subscriptions per day. But recently they’re almost all of the form [name of girl][number of 2 – 4 digits]. This is what they look like:

Twitter followers: suspicious lot

Continue reading Twitter spammers: Clickbank/Keynetics affiliates

Bob Flora is a spammer

Linkeroever spam

Dear Mr. Bob Flora,

you are probably a collaborator for the “Linkeroever” movie. I see you were a sound designer for “Dju!” by Daniel Lamberts, who’s a friend of mine. So you’re connected to the Belgian cinema scene. But might I point out that we have never talked or met in person. So when you send me an email like the one above, that is not only impolite, it is also spam.

First off: didn’t your mom teach you proper manners? You don’t address me, you don’t introduce yourself, you just start shouting “Check it, rate it, forward it!”. Do you think that exclamation mark is gonna convince me? Never heard of the word ‘please’?

Secondly: where did you get my email address? I sure never gave it to you. I’m gonna reply to you to request to be removed from your spamming list, and it would be a good idea to comply with that.

Finally: do you think you’re doing Linkeroever or Pieter Van Hees a favour with stunts like that? I’m not linking to the movie or the Youtube trailer, as you might notice. That’s because you pissed me off. Your marketing skills are severely underdeveloped. Do something about it, or stick to designing sound.

 Regards,

Peter

The sneaky shall inherit the earth

Ge moet maar durven

“Wie niet waagt, blijft maagd”, as they say over here. This guy promises to ask his spam buddies to stop comment-spamming, als long as you put a link to his site. A while ago, he used to promise only “If you dont like advertising comments please send me an email with your site address to tedirectory(at)yahoo(dot)com and I will not write on your site” (cf yahooinsiders), but now he seems to have expanded his influence. He is spamming several of my websites continuously. The source seems to be some people over at Global Net Access, Atlanta (via spam.tinyweb.net).

Which makes me dream of ‘Big Spammer’, a TV-show where known spammers are followed by a hidden camera for a couple of weeks (‘see how he has been wearing the same shorts for a whole week now’) after which they are sued, convicted and dragged to jail, while all their computers are crushed by a huge truck. Mmmm, revenge …

My first Facebook spam

facebook-spam
The Belgian Facebook community is growing and I was expecting to have a first ‘spam’ invitation eventually. It came a couple of days ago, from a group that goes by the name “In Loving Memory of Juliane Angel”. It seems to be created by a guy in rememberance of a girl that died some time ago (Juliane Angel). Even if this sad story is not a marketing ploy (which I don’t think it is), the invitations are still ‘unsollicited’. Whether they are ‘commercial’ is hard to say now, that might only become apparent when there are enough members (only two for the moment).

When I joined Facebook, there were only a few people on it that I knew. The last couple of weeks, there have been invitations coming in on a daily basis of people that I (sort-of) know. The typical early adaptors, of course, but also some outsiders. Juliane is the first spam attempt I get – not a full-fledged Nigerian scam or erection drug, I admit, but still. I hope Facebook can control misuse; because MySpace surely can’t.

Other articles on Facebook spam: Marketing SherpaOrganized Confusion

Govern yourselves accordingly

I just received the following email:

Attention Mr. Forret,

It has been brought to our attention that you published or caused to be published an e-mail communication and/or internet bulletin containing words that are false, misleading and defamatory to our firm. More specifically, these publications can be found at:

blog.forret.com/2004/12/domain-registry-of-america-scam/

More specifically your statement “Domain Registry of America scam”. That statement is false and misleading in many ways:
1) The title of the publication accuses Domain Registry of America as being involved or perpetrating some type of scam, which his false.
2) Domain Registry of America’s mailings do not “urge” or “scare” anyone to take any action towards the mailing. It informs domain name holders that they now have the option to “transfer and renew” their domain name with any Registrar of their choice and take advantage of lower pricing and better service.
3) Your use of the phrase “it’s a scam”
4) Unaware to you, this mailing has been approved by the Federal Trade Commission as clearly describing it meaning and purpose. (PF: Actually they’ve done quite the opposite)

Your publication has caused and continues to cause Domain Registry of America irreparable damages and we intend to hold you responsible for these damages both past and present. You are hereby notified that we demand these false, misleading and defamatory statements mentioned above that you have published or have caused to be published be removed by no later than 15 days of your receipt of this notice.

If we do not receive written notification that these publications have been removed by the above deadline we will without further warning, advise our lawyers to commence a lawsuit in an Ontario court for damages and a permanent and interlocutory injunction restraining you, your employees, agents and representatives from making and publishing such publications. Domain Registry of America/Canada has successfully taken legal action in the past against other publishers of similar false, misleading and defamatory statements.

Govern yourselves accordingly,

Domain Registry of America/Canada
Relations Department
legal@droa.com

Continue reading Govern yourselves accordingly

Viral in the bad sense: MessengerChecker

I just received an email on my Hotmail account from someone that normally never contacts me. The email itself is clearly generated by an automatic process:
msnchecker2

When I take a look at the website that was cited (I won’t link to it), it is not clear to see what the service is actually about: I’m guessing it is to see who blocked you (MSN contacts that look off-line to you, but that are actually online). To check if one of your MSN buddies has blocked you, you ‘only’ have to fill in your Hotmail username and password. This should already make you nervous: you should never give those credentials to a site that’s not Hotmail or MSN.
messengerchecker: viral in the bad sense
Take a look under the “Verder” (= “Continue”) button. In a very light gray (#dfdfdf to be exact), there is the option to email all your MSN buddies, and by default it’s ON. Since it is hardly visible, I guess most people who try out the service leave it like that and as such ‘give permission’ to send out a couple of dozen to several hundreds of emails. You only need a few gullible recipients to create a ‘viral’ effect.

In the terms and conditions on the bottom of the (very long) page, you’ll find:

6. De gebruiker die deze dienst gebruikt is zelf verantwoordelijk voor het goed bekijken van de opties alvorens hij of zij op de knop [ verder ] drukt.
7. U dient zelf de optie [ mailen naar uw MSN vrienden ] uit te vinken als u uw vrienden niet wilt mailen.
8. Er kan geen aanspraak worden gemaakt op de werking van onze diensten omdat wij het checker systeem niet zelf hosten. Wij zijn alleen een technische kant die er voor probeert tezorgen dat u contact kunt krijgen met de MSN server. De MSN server/checker kan soms offline zijn. Wij mailen absoluut zelf niet. Alle mailtjes worden door de gebruiker zelf gedaan. Hij of zij is hier dus ook zelf verantwoordelijk voor. Bij overmatig gebruik kunt u mailen naar de persoon waarvan u het mailtje heeft ontvangen. Bij gebruik van onze dienst stelt u ons vrij van enige schade aan derden.

In short:
#6: the user is responsible for verifying all options before clicking [ Continue ]
#7: you should disable the option [ send mail to all MSN friends ] if you don’t want to send those messages
#8: we don’t send the emails, the user does. If you have complaints, contact that person, not us.

I certainly don’t agree with their point #8. Technically, they send the messages. They could claim the user ‘requested it’. In any case: it’s spam!

The person responsible for the site is already known as the “Mongool van scripthosting“: ene P.J. (Peter) Bierling from Groningen.

MySpace: bulletin and other spam

MySpace spam

MySpace is a vast collection of web real estate begging to be spammed. I keep receiving spam bulletins from some of my MySpace friends, so this is a little explanation of what MySpace spam is and how it can be fixed:

Tricks used by MySpace spammers

Trick #1: hidden bulletin post form
As described by ericis.com, MySpace did not protect the bulletin submission page enough. Bulletins could be sent by an unsuspecting logged-in user through a hidden form, instead of only through the official submission form. So you might click on what seems to be just a link to a site/profile, but you are really sending a bulletin to all your friends. This mail might invite them to click on a link which hides another hidden form and …
STATUS: This vulnerability has been addressed by MySpace, but whether it is completely fixed is another question.

Trick #2: man-in-the-middle password theft

Continue reading MySpace: bulletin and other spam

Colorbar: belgian spam

Colorbar: belgian spamIn the last three days I have received 3 mails from Colorbar, a “lively private club for colorful people”. The first one didn’t trigger my suspicion, since I am subscribed to some music-related mailing lists. The two next mails came for 2 @forret.com aliases of which I am certain they never subscribed to any list. So I took a closer look at the email. No contact details are given, no indication of where the email addresses came from, no possibility to unsubscribe, i.e. it’s a spam mail. To be even more specific: a belgian spam message.
Continue reading Colorbar: belgian spam