Repairing Amazon S3 downloads for IE

I use Amazon S3 for cloud storage of big digital-cinema files (up to 3GB) for distribution. It works fine most of the time, but I kept getting the odd complaint: “I can’t download on my PC, I get an error”. Everytime I asked what browser they were using, it was Internet Explorer. I am a Google Chrome man, and I almost never do anything with IE, but still, customer is king, let’s see what could be wrong. So I tested it myself with IE and yes, most files can be downloaded, but some couldn’t. Sometimes one would get an empty page, sometimes the following: “XML 5619: Incorrect document syntax

So I fire up  Fiddler2 – an invaluable tool to see what’s going on under the hood of the communication between your web browser and the web server. I look at the client and server HTTP headers and see something interesting:

1) Download via Chrome

CLIENT:

User-Agent: Mozilla/5.0 (Windows NT 6.0) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.122 Safari/534.30
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip,deflate,sdch

SERVER:

Content-Type: binary/octet-stream
Content-Length: 26176425
Server: AmazonS3

2) Download via IE for a file that can be downloaded:

CLIENT:

User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)
Accept-Encoding: gzip, deflate

SERVER:

Content-Type: binary/octet-stream
Content-Length: 26176425
Server: AmazonS3

3) Download via IE for a file that can NOT be downloaded:

CLIENT:

User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)
Accept-Encoding: gzip, deflate

SERVER:

Content-Type: application/x-zip-compressed
Content-Length: 687411306
Server: AmazonS3

It was a consistent pattern: every time the Content-Type of a file was x-zip-compressed, I couldn’t download . It might have something to do with MS KB 841120: the server that recompresses .zip files with gzip, and the browser mis-interpreting.

Anyway, I used CloudBerry S3 Explorer to go and explicitly change  every file’s HTTP headers and now I can download all files with IE. If I ever forget about this IE quirk, now I’ve written down the solution!

The early days of (e)book piracy

I was thinking about this the other day. Piracy is really big for CDs and DVDs. One of the main reasons is that both media are so easy to digitize. Pop in a CD and in 6 minutes you have everything in MP3 files. Converting a DVD to XVID takes a bit longer and is slightly more complex, but not that much. Once they’re (unprotected) files, you can swap away. But books, we’ve always bought them in analog, paper form. Digitalizing meant scanning them, and that was just too much work.

Now that’s changing. Amazon is selling digital books on their Kindle device (240.000 devices sold in Aug 2008, 12% of books offered in both digital & analog are sold digital), Sony has a digital book reader (the PRS-505-SC), iRex has the iLiad. There will be more and more books available in digital format, and those will inevitably become a target for piracy.

The Kindle has its own AZW digital eBook format, but this is probably derived from the Mobipocket MOBI/PRC format. Mobipocket was taken over by Amazon in 2005. AZW/PRC support DRM (Digital Rights Management – a.k.a. you can’t read it unless I allow you to) for eBooks. Sony has its own (of course) format which is called BBeB (Broadband eBook), which also has DRM. Most readers also read PDF files.

My guess is, that as more books are being offered in digital format, there will be an increased interest in the DRM secuirty behind the file formats, and hackers will find ways to convert full books to an unencrypted format. This might be PDF or PRC/MOBI. And these files will be exchanged in the same way as we some people exchange music and movies. You will have a tab “eBooks” on thepiratebay, and youngsters will say “I have all Steven King’s books – downloaded of course, duh!” My guess is also that publishers will start blaming Amazon, and start suing their own customers, like the RIAA and MPAA are still doing. And it will take years for them to figure out that DRM is not a good thing, that it is possible to make money by selling things that can be copied. And they’ll probably arrive at conclusions that Seth Godin has been talking about for years already now.