Idea: hosted classification service

Yesterday evening I was watching “How to replace yourself with very small shell script” by Hilary Mason.

YouTube Preview Image

In short: she uses some scripts to process incoming mail and send outgoing reminders. The part that really interested me is the one where she uses classification, probably naive Bayes, to extract topics from the tweets of her friends.

That made me think about Paul Graham’s famous spam essay (2002), which boosted the development of Bayesian spam filters for email. A Bayesian spam filter will, in very broad terms, analyze the words in a message, compare them to words typically used in a ’spam’ or ‘ham’ collection, and come back with either a binary classification (spam/ham) or a spamminess score. The first time I read that article must have been back in 2003 or 2004. I recall installing one of the early versions of POPFile, a spam filter written in Perl. It worked as a POP3 proxy and did a pretty good job. POP3 made sense, because at that time, the only spam we had was email spam. Now there’s blog spam, comment spam, trackback spam, Twitter spam …

But these are the cloud days, right? If you think about it, Akismet (Wordpress) and Mollom (Drupal) offer cloud-based spam filtering. Before them, Postini (now part of Google) offered hosted spam filtering services for email. But would it be possible to offer a very generic web service-like document classification service? Imagine the service classifier.com:

  • you register and get your own subdomain at myapp.clasifier.com
  • you choose whether your service will return one of a number of classes (ham/spam or urgent/normal/ignore …) or a numerical score.
  • you choose a tokenizer: defines what words will be extracted from your input: e.g. you can ignore, include or reformat email headers, you can ignore or transform HTML code, …
  • you create a corpus per category, the service will tell you if you have enough input
  • you call the service with an HTTP POST with an API key and the new document content to be classified, and you get back (in JSON/XML) the result

Sounds like something Google would offer? Well, they do, in some way: http://code.google.com/apis/predict/ Now if someone would develop a nice and easy interface around it …

If you're new here, you may want to subscribe to my RSS feed or receive updates via email. Thanks for visiting!

Not happy with the Canon 500D

Couleur Cafe 2006
In June 2006 I bought my first reflex camera: a Canon 350D. About the same time I started taking pictures of tango (above: my first tango picture, at Couleur Cafe 2006). And it was the start of an exciting journey. Concerts, milongas, tango festivals, portraits, I discovered the joy of creating – or recording –  beauty. It has become a passion, and a privilege to do. I love the concentration, the play with light, the search for the right frame, the waiting for the perfect moment and then, every now and then, the joy of seeing that you’ve created an image that actualy IS worth a thousand words.

Along that way, my 350D was my trusted accomplice. I took it everywhere, first in a simple black camera bag, afterwards, when I started buying more lenses, in a Lowepro backpack. Eventhough the screen on the back of the camera was small, it gave me enough feedback to know whether I was taking pictures the right way, allowed me to finetune ISO, whitebalance, shutterspeed. It sometimes felt like the extension of my hand, of my eye. I just loved that camera.

But then, end 2009, it started breaking down. First random power issues, then just dead. It was sent to Canon, they said: completely oxydated, we have to replace the whole interior. So I needed a new camera. I doubted a lot: should I take the 500D, its successor, or the 5D Mk II. In car terms: should I stay in the BMW 3 series, or move up to the 5? After some weeks of hesitation (“that 5D is a lot of money“), I finally settled for the 500D. Boy, have I regretted that.

Continue reading ‘Not happy with the Canon 500D’

Focal length for the common man: “portrait distance”

I remember that before I started photography on a serious level, I had some understanding of shutter speed, but none of aperture and focal length. Even when I read what they meant, I still couldn’t ‘picture’ it, had no feeling for the numbers. Let’s leave ‘aperture’ for another time and just concentrate for now on the concept of “focal length”

First of all, the focal length of a lens is not the same as the actual physical length of the lens. Yes, 200mm and 300mm lenses (telephoto lenses) tend to be longer, but they’re not exactly 200mm and 300mm long. For instance, the Sigma 55-200mm F4-5.6 DC HSM is 85mm (3.3″) long,  while the 70-200mm F2.8 II EX DG lens is 184mm (7.2″). Same maximal focal length, but more than twice as long.

So what is focal length? I could explain that it is “the distance from the center of the lens to the principal foci (or focal points) of the lens“, but that wouldn’t make it more comprehensible, would it? Well, I read through the theory, with tangens of the viewing angle and stuff, and I think I understand it (I’m an engineer, I actually like trigoniometry). A 200mm lens gives a viewing angle of 12° on the diagonal. Still not clear? That’s when I thought: let’s invent something more tangible: the “portrait distance“. Say you need a surface of about 72cm x 48cm (28″ x 18″) to make a portrait of a person (not just a headshot, but with some torso on it too). See some examples below:

Vriendschap foto's voor Erfgoeddag Sandy @ Chaff Brussels Tango Festival - Day 1 ¿Que? Fado & Tango - Dirk

Well, the distance between the camera and the person you’re making the portrait of, will be +- 20 times the focal length.

Continue reading ‘Focal length for the common man: “portrait distance”’

“I will you in the night” – Idool 2003

At the Pixagogo reunion dinner the other evening, I was reminded by one of my ex-colleagues Steven (‘Beukie‘) that back in 2003 I was having some fun with remixes/mashups. More specifically, I took some vocals of the Belgian “Idool 2003″ preselections, and added music to them. To make the exercise more fun, I took samples from the ones that were really musically challenged.

So I went back in my archives and here are the three that I found:

  • “I will you in the night”
    Marnik had translated a Flemish song, into his own ‘impoverisation’, as he proudly announces. Unfortunately, the Dutch “Ik wil je” (I want you) does not normally translate into the English “I will you“.

    I also found back the original clip on Youtube (via partybrigade):
  • “But if I let you go”
    This ‘Pieter’ was officially called the worst candidate by the jury, and that decision is not impossible to understand. He had no tone, no rhythm and bad English. “There snow one like you!” He needed a lot of input of Madonna to make it bearable.
  • “Killing me softly”
    She was not that bad a singer, but her timing was awful. I remember having to cut and trim a lot to align her words to a steady beat. I made it a slow jazzy version with a lot of echo.

Out of that edition of Idool came Hadise, Brahim and Natalia, so it wasn’t all that bad. Still, there was also the girl with the wobbly hands:

Fax 2.0: because fax won’t die in the internet age

In one corner of my apartment: my fixed telephone line. In another my printer/scanner/fax device. Challenge: run a wire from one to the other, every time you rearrange the furniture.

Recently I investigated web fax services like eFax, WebFax, RingCentral but for a low volume user like me they’re too expensive. You pay a lot of money for having a dedicated phone number for you, regardless of the number of faxes you send/receive. But I already have a dedicated telephone number, only it is completely disconnected from my ‘normal’ workflow: email, web, news reader. I would like to receive my faxes in my Gmail, because I never delete mails. With 7GB+ email storage, I don’t need to.

So what I would like to have, and what I don’t think exists yet: a Fax 2.0 device at home, let’s call it the FaxaPorta. It needs power and a phone connection, and … that’s all. So let’s make it look like this (not uninfluenced by the Apple Airport Express):

Faxaporta mockup

Here’s how it works:

  • You plug the Faxaporta in a power outlet and connect to the phone plug.
  • The device has built-in wifi and will connect to the internet in that way.
  • You associate the device with your account on the Faxaporta website.
  • Now you can configure how it is supposed to work:
    • Incoming fax: send it to an email address as a PDF file, print it (you can connect a printer to the USB port)
    • Incoming voice call: take a voice mail and send it to an email address as a MP3 file, forward the call via Skype
    • Outgoing fax: behave like a network printer, or you upload a PDF file to the Faxaporta web site (it is then downloaded by your own Faxaporta device and sent over your own phone line).
  • But because your fax is now part of your web-connected world you can do cool stuff like:
  • When you get a fax/voice call, the Caller ID (phone number of the sender) is being matched with your Google contacts to add name, company and email of the sender.
  • The faxes your receive pass through Faxaportas service and are OCR’ed so that you can copy/paste the text on it (cf. the ScanR service).
  • The voicemails are run through a speech recognition service so that you get a text transcript together with the MP3 file. (Google Voice has this)
  • The whole configuring of the fax/voice service is no longer done on a silly small screen on the fax machine with 15 cryptic buttons, but online, from anywhere you want. New response message? Upload the MP3 file! New front sheet for outgoing faxes? Create it in a WYSIWYG editor!
  • You have an RSS feed for your incoming fax messages, one for your incoming voicemails.
  • You could even make a ‘better’ (more expensive) service for companies:
    • try to route a fax to the right person (depending on who sent it, on names that were OCR’ed in the document)
    • set up a Interactive Voice Response system through the browser (“For Sales, press 1”).
    • create a searchable fax archive
    • How about a fax ‘out-of-office’ service?

    Does the Faxaporta exist already?

    Dissection of the Phantom Menace

    Via hackerfactor I came across this gem: a 7-episode dissection of just how bad the 1999 Star Wars: Phantom Menace was. The guy who made it has a very specific style, insightful, funny but sometimes quite disturbing.

    Try episode one:

    YouTube Preview Image

    Continue reading ‘Dissection of the Phantom Menace’

    Newscorp is indeed dropping out of Google

    The big disappearing act

    When Rupert Murdoch announced that he would remove his sites from Google (in order to make a deal with Microsoft, so that only Bing would have the NewsCorp pages, as we now assume), he apparently wasn’t kidding. Although all Google web sites still indicate that e.g. MySpace has 179 million pages in the index, the Google API is currently returning another number for that: only 7 million. The total number of NewsCorp pages (a sum of MySpace, IGN, RottenTomatoes, …) has dropped from 192 million to 12 million.

    Newscorp is dropping out of Google

    (trend via http://trend.visualizor.com/g/1011 )

    Which sites are Newscorp?

    Let me give you some of his ‘big’ sites and how their # indexed pages have dropped:

    • Myspace: from 179 mio to 7 mio
    • RottenTomatoes: from 4 mio to 100.000
    • IGN: from 4 mio to 300.000
    • Stats.com: from 2.4 mio to 50.000
    • News.com.au: from 1.2 mio to 70.000
    • Sky.com: from 1.4 mio to 85.000

    I suspect the Fox, National Geographic, Daily Telegraph, and other sites will soon follow.

    Did he send in the robots?

    I checked to see if NewsCorp finally started using the robots.txt file, because that’s the way you’re supposed to remove content from Google, not with press conferences.

    Myspace:

    User-agent: *
    Disallow:

    RottenTomatoes:

    User-agent: Mediapartners-Google
    Disallow:

    And the answer there is “no”. So I’m not sure how they tell the Google crawler to stay out.

    — UPDATE —

    Source of the data:

    The numbers come from http://tools.forret.com/newscorp/, which uses the Google Search API. I double-checked the replies from the API: for MySpace.com I get "estimatedResultCount": "6950000" so 7 million, not 179 million. If there’s an error, it’s in the Googleplex.

    iPhone bandwidth: orders of magnitude

    04112009175905[1]I did a bandwidth test the other day with the iPhone SpeedTest tool. I wanted to compare the speed using (standard) GPRS, using 3G and my own Wifi. The results were all a power of ten apart:

    • iPhone on Proximus GPRS: 35 kbps (download & upload)
    • iPhone on Proximus 3G: 350 kbps (download & upload)
    • iPhone via Wifi: 3500 kbps (download – upload is +- 300 kbps)

     

    The real reason is that I wanted to see how fast I would wear out my Proximus data plan (200MB per month). The answer: with GPRS I would need more than 12 hours of continuous downloading, with 3G I could do it in less than 2 hours. So GPRS is pretty safe, it’s also easier on your battery, but you have to live with slow, pre-1996 modem-like performance. The latency – the time it takes to get your first byte after requesting a URL -  is easily 10 to 50 seconds. Not milliseconds, seconds!

    As a side note: do not take a time-based data subscription, certainly not with the iPhone. My first post-iPhone Proximus invoice was 800,- euro, which is more than the price of my iPhone! When I contacted them about that, they immediately offered to reimburse it and advised me to switch to a size-based plan. I guess I was not the first one …




    Rss Feed Facebook button Reddit button Delicious button Stumbleupon button Newsvine button Youtube button