Repairing Amazon S3 downloads for IE

I use Amazon S3 for cloud storage of big digital-cinema files (up to 3GB) for distribution. It works fine most of the time, but I kept getting the odd complaint: “I can’t download on my PC, I get an error”. Everytime I asked what browser they were using, it was Internet Explorer. I am a Google Chrome man, and I almost never do anything with IE, but still, customer is king, let’s see what could be wrong. So I tested it myself with IE and yes, most files can be downloaded, but some couldn’t. Sometimes one would get an empty page, sometimes the following: “XML 5619: Incorrect document syntax

So I fire up  Fiddler2 – an invaluable tool to see what’s going on under the hood of the communication between your web browser and the web server. I look at the client and server HTTP headers and see something interesting:

1) Download via Chrome

CLIENT:

User-Agent: Mozilla/5.0 (Windows NT 6.0) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.122 Safari/534.30
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip,deflate,sdch

SERVER:

Content-Type: binary/octet-stream
Content-Length: 26176425
Server: AmazonS3

2) Download via IE for a file that can be downloaded:

CLIENT:

User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)
Accept-Encoding: gzip, deflate

SERVER:

Content-Type: binary/octet-stream
Content-Length: 26176425
Server: AmazonS3

3) Download via IE for a file that can NOT be downloaded:

CLIENT:

User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)
Accept-Encoding: gzip, deflate

SERVER:

Content-Type: application/x-zip-compressed
Content-Length: 687411306
Server: AmazonS3

It was a consistent pattern: every time the Content-Type of a file was x-zip-compressed, I couldn’t download . It might have something to do with MS KB 841120: the server that recompresses .zip files with gzip, and the browser mis-interpreting.

Anyway, I used CloudBerry S3 Explorer to go and explicitly change  every file’s HTTP headers and now I can download all files with IE. If I ever forget about this IE quirk, now I’ve written down the solution!

Review: Synology DS410 8TB NAS

Storage vendors should come to me for heavy duty testing, I have way too much hard disks break down on me. Last year my 4TB Lacie drive died. It’s a good thing I’m paranoid about data storage and I had 2 copies of my photo archive (now about 1.4 TB) elsewhere. Although my Lacie drive had ‘protected’ storage, after repair it came back reformatted. I decided to never buy Lacie anymore – I have had a 1TB, 2TB and a 4TB drive and they have all broken down at some point. My next storage solution would be a stand-alone NAS with 4 disks!

After reading some reviews on QNAP and Synology, I decided on the Synology DS410. I ordered it at Memoryshop for a decent price and some days later it was shipped to me together with 4 Samsung 2TB drives. Installation is swift and uneventful. I configured it as one big 6TB RAID-5 volume and started copying all my pictures, music and movies. The device comes with the shares /music, /video and /pictures preconfigured, and copying to these folders makes sense, because then the music appears in the handy iTunes server, and all media shows up in the DLNA Media Server.

The advantage of a Linux-powered NAS is that it comes with a number of easy-to-install applications (Torrent Client, MySQL, LAMP stack web server) and you can even install, through ipkg, lots of standard Linux packages. On the QNAP server at the office, I have file sync tasks running at regular intervals and it works flawlessly.

If you’re serious about your storage (because you need it for your work), don’t be content with just an external USB drive. Invest a bit more to have a NAS you can trust. And also: never trust it 100%. I now have +- 16TB of storage at home so that I have multiple copies of everything and I also use Mozy cloud storage for my exported pictures (‘only’ 12 GB for the moment).

Review: Panasonic TX-L42E30E LCD TV

I bought and/or used quite some new gear the last couple of months and I’ve been meaning to write about my experiences and never got around to actually start. Because I like reading other people’s reviews before I buy anything, I’ll start writing my own now! Let’s start with the biggest one:

Why I ‘needed’ a new TV

When digital TVs came out, the price for full HD (1080 lines instead of 720) was very high. I remember seeing all those 2000€+ beasts and it just didn’t make sense to switch yet, also there was almost no source of Full HD video. Blueray was launched around 2008 and seemed more like a ploy to make you buy all your old films again in a slightly better format. However, the prices of Full HD has dropped a lot, and I had more and more devices at home that had HDMI outputs and 1080p output that I could not show in full glory. So I started shopping around beginning of this year.

I knew I wanted an A-brand, which boiled down to Sony, Samsung, Panasonic, Philips, LG or Toshiba. Shopping for digital TV is exhausting. Each brand  seems to have 150 different models which are sometimes hard to tell apart. I know I was thinking during the shopping: I wish Apple made TV screens: they would have a 32, a 44 and a 56″ model: just pick one. I wish I was able to say that I made a huge spreadsheet with all models, features and prices, created a N-variable price model and chose based on that, but honestly, I just picked one that looked nice in the shop (Vandenborre) and seemed like a good bargain. So I now have the Panasonic 42″ Viera TV. Vandenborre offered to come bring and install it, but I opted for picking it up in the shop.

Continue reading ‘Review: Panasonic TX-L42E30E LCD TV’

Experimenting with movie hashing/fingerprinting

MEDIA FINGERPRINTING
Neal Krawetz wrote an interesting article on image fingerprinting, or how to search for images that are similar. He proposes an algorithm to do image fingerprinting and reproduce the functionality of TinEye, a service that allows to give one image and get back all the web pages where that picture, or a slightly modified version of it, is included. By resizing the image to 8×8 pixels, creating a B/W version, and then a binary (only black or white pixels, no grey levels), he reduces a picture to a hash that is 8 x 8 = 64 bits. This can then be compared to a database of hashes of millions of other pictures found on the web (by calculating the “Hamming distance” – read the article for details).
On the other hand, something similar can be done on segments of audio. Youtube has been doing it for years (using technology from Audible Magic) and recently the Echo Next has released Echo Print, a music fingerprint and identification service that does the same thing for free.

VIDEO FINGERPRINTING
Since I work (and play) a lot with video, I was thinking about how to extend the ideas Neal proposes to video. Video material consists of audio (which I will ignore for now) and a sequence of images (typically between 24 and 30 per second). I’ve taken a video clip from Youtube as inspiration, but I won’t tell you yet which one, let’s discover it while I create the fingerprint of 1 of the frames, using the method described by Neal:

first we reduce the frame to a small format, e.g. 16×8 pixels
then we desaturate: make the picture grayscale
then we normalize it: we maximize the contrast, the darkest pixel become the new black, the lightest becomes the new white
we calculate the average darkness and any pixel that is darker we make black, and the others become white -this is what is proposed in the article. Total size of fingerprint: 16 pix * 8 pix * 1 bit = 96 bits = 16 bytes
I found the details in the above fingerprint too coarse, so I used 4 color levels instead of 2. Total size of fingerprint = 16 pix * 8 pix * 2 bits = 32 bytes

Continue reading ‘Experimenting with movie hashing/fingerprinting’