Youtube for PDF: embedding documents
31 Aug 2006PDF Documents
Playing around with embedded Google calendars and reading “Google Apps and the power of embedded functionality“, I got to thinking: what would be other good candidates for a 1-click-embedding provider of other types of documents. I wondered e.g. whether there was something like Youtube for audio (yes, there is. GoEar.com is one example). But then I thought of a document type we all have learned to love and hate: Adobe’s Portable Document Format or PDF.
The idea behind the PDF file format is valuable: create a cross-platform standard for exchanging and printing documents that includes the text, images, fonts and their layout. The Acrobat Reader is a free application, and while Adobe’s PDF Writer is a commercial product (from $299), there are enough free alternatives to counter that. The thing is: PDF is great for printing, but not for browsing.
When you click on a PDF link, one of multiple things might happen:
- If the web server does not send a header “
Content-Type: application/pdf
“, your browser has no clue what to do with the file: it will let you download the file and that’s all - if Acrobat is not installed, you can also only download the file, because your browser will not know what to do with files of type
application/pdf
- if Acrobat is installed, but you work with a browser that is not tightly linked to the OS (e.g. Firefox on Windows), it still might not open in Acrobat.
- if your browser has configured Acrobat as a helper application, the file will download and will then be opened with the reader. So you will have 2 applications open: your browser and Acrobat Reader. This is actually the best method.
- With Internet Explorer on Windows, Acrobat will open inside your browser, and your menu bar will become an interesting mix of IE and Acrobat options. (Where is the print button? Ah-ha!!) When you close the browser, a copy of Acrobat will continue to run in an invisible way, taking up some 32MB of memory
- If you had no indication how big the PDF file was, you might be fiddling your thumbs for the next 5 minutes while the document is being downloaded.
- If the document uses fonts that you don’t have, you might be looking at a very weird layout
So you are screwed if you don’t have Acrobat, if it is a really big document or if you access through a misconfigured browser. This is the equivalent of clicking on a .AVI movie file without a clue of how big the movie is, whether you have the necessary audio and video codecs to see it, and whether it’s worth it. If that was largely solved by a service provider like Youtube, what would a similar service for .PDF files look like?
Youtube/Flickr for PDF files
Let’s call this web site PDFViewr. When you want to embed a PDF into your web site / blog, you first upload it to PDFViewr. There the document information is extracted (title, author, # pages, filesize, fonts, …) and a preview of the first page is generated as a JPG/PNG image. You can edit that info and add tags or notes (e.g. “Skip the first 15 pages, the juicy bits are page 16 to 23”). You are then given a short piece of code that you can embed into your own site. The result would be like this (taking an example from a 13MB PDF from taschen.com):
Title: | TASCHEN Magazine Summer 2006 |
Author: | Taschen.com |
Info: | 92 pages, 13.8MB, full color |
Tags: | taschen, magazine, germany, art, erotic |
Link: | |
Embed: | |
Read in Acrobat – Read as HTML – Slideshow | |
Download PDF – Download ZIP – Save for later | |
Send to friend – Print & bind |
Actually it wouldn’t be that hard to set up:
- The difference between automatically downloading or opening a document can be controlled with the
Content-Type
HTTP header. - the JPG preview of a PDF file can be created with ImageMagick + GhostScript (free)
- you could easily add the same services for remote PDFs, i.e. the customer gives a URL instead of uploading a document. There is a whole copyright minefield there that I will wisely ignore.
- Since we will have stored or cached each PDf file, it’s easy to let users add PDFs to their own PDFviewr storage account.
- A connection to a remote print and bind service like Print(fu) is very easy to make. It’s a pity Print(fu) does not ship to Europe yet, because I would surely use them to have e.g. the DCI specs (176 pages) printed in a nice booklet.
- The equivalent for Youtube’s video format conversion (Quicktime, MPEG4, AVI … to Flash video) is our PDF conversion to HTML, JPG.
- Since documents are normally formatted in portrait orientation (higher than wide) and computer screens are normally in landscape orientation (wider than high), they are no natural match. To view a PDF document on screen, one could use a two-facing-pages layout for large screens, or a half-a-page-at-a-time approach for smaller screens.
- In all our web 2.0 enthousiasm, we could add folksonomy (tags), comments, ratings … so that “good content” would float up.
- Monetisation? Well, add a payment system for commercial documents.
Any ideas?