I store quite a lot of info in Google Spreadsheets, for the obvious reasons:
- anyone can edit from any place, even at the same time
- the servers are more reliable than a server at the office
- I can use the info (with CSV/Excel export) in other programs through a web link
But there is a problem popping up at random moments with that last export or ‘publish’ functionality. Sometimes when you download the published link of a CSV export (through curl), you get an error ‘Moved Temporarily – The document has moved‘ with a redirect to a www.google.com address. And if you don’t follow HTTP 302 redirects, you can’t get to the actual content. In the past I’ve always worked around it or waited until the error went away, but today I searched a bit further. So for those who have the same question: read and learn!
The redirect is actually for authentication. Although I publish without requiring signing in, so one would expect no authentication process, there actually is one. See what it does (I used wget in verbose mode to get the HTTP headers):
>>>:~$ wget -v “https://spreadsheets.google.com/(…)&output=csv”
Location: https://www.google.com/... (first redirect)
Location: https://spreadsheets.google.com/... (second redirect)
Saving to: ...
So what is the solution: just add “&ndplr=1” to your URL and you will skip the authentication redirect. I’m not sure what the NDPLR parameter name stands for, let’s just call it: “Never Do Published Link Redirection“.
I use Google Calendar as a vital piece of milonga.be: me and some 20 other editors keep an up-to-date calendar of tango events in Belgium. We’ve been doing that for the last 3 years, so there was a lot of old, no-longer-relevant data in the agenda. The way I use the calendar on the site is that I download all the appointments as a .ICS (iCal/gCal) file and then format/display it with another program. But with all the old data still present, that ICS file had grown to more than 1MB, and this size slowed down the updates (I download the whole thing every 30 min). So I decided to delete all old data (2007 – 2009). Not that easy.
Google Calendar’s web interface doesn’t really allow you to bulk delete. There is no way to select several dozens of appointments and delete them in one go. But I found a way that works (suggested here):
- Install Mozilla Thunderbird (desktop email client)
- Install Mozilla Lightning (calendar plugin for Thunderbird)
- Install Provider for Google Calendar (Gcal plugin for Lightning)
- Look up the Google Calendar Private iCal URL of your calendar (something like
- Add it to Thunderbird with FILE/NEW/CALENDAR/NETWORK/GOOGLE CALENDAR
- You now have a read/write connection to your Google Calendar!
Select the appointments you want to delete, hit the ‘Del’ button and see them disappear one by one.
The big disappearing act
When Rupert Murdoch announced that he would remove his sites from Google (in order to make a deal with Microsoft, so that only Bing would have the NewsCorp pages, as we now assume), he apparently wasn’t kidding. Although all Google web sites still indicate that e.g. MySpace has 179 million pages in the index, the Google API is currently returning another number for that: only 7 million. The total number of NewsCorp pages (a sum of MySpace, IGN, RottenTomatoes, …) has dropped from 192 million to 12 million.
(trend via http://trend.visualizor.com/g/1011 )
Which sites are Newscorp?
Let me give you some of his ‘big’ sites and how their # indexed pages have dropped:
- Myspace: from 179 mio to 7 mio
- RottenTomatoes: from 4 mio to 100.000
- IGN: from 4 mio to 300.000
- Stats.com: from 2.4 mio to 50.000
- News.com.au: from 1.2 mio to 70.000
- Sky.com: from 1.4 mio to 85.000
I suspect the Fox, National Geographic, Daily Telegraph, and other sites will soon follow.
Did he send in the robots?
I checked to see if NewsCorp finally started using the
robots.txt file, because that’s the way you’re supposed to remove content from Google, not with press conferences.
And the answer there is “no”. So I’m not sure how they tell the Google crawler to stay out.
— UPDATE —
Source of the data:
The numbers come from http://tools.forret.com/newscorp/, which uses the Google Search API. I double-checked the replies from the API: for MySpace.com I get
"estimatedResultCount": "6950000" so 7 million, not 179 million. If there’s an error, it’s in the Googleplex.
So I decided to let Facebook check my Gmail contact list to see if I had missed some contacts (people using aliases, etc …). After carefully selecting a couple of FB friends to invite (a buddy from the army, …), I clicked ‘Select’ and then ‘OK’ on the next screen that I supposed was a ‘Confirm’ window. I didn’t even read what was written on it. Some minutes later I saw emails starting to come in on different email aliases I had created in all my years of Internet activity. Apparently I allowed Facebook to send email messages to all Gmail contacts with email addresses that were not yet ‘known’ in Facebook. I have about 1500 addresses in my Gmail, let’s say some 500 already have a FB profile: so I just allowed Facebook to send out 1000 ‘unsollicited commercial emails’ or *spam* on my behalf. There is no way for me to know how many emails went out, nor to whom. I feel strongly embarrased, since I have been a strong opponent of spam for years, and since I have no idea who I have bothered with this bulk mail.
A company like Facebook probably has a whole team concentrated on user experience and workflow streamlining, so I can only assume that this strategy is by design. They probably have to keep the monthly exponential growth numbers so they use every opportunity to collect new email addresses. This is plain wrong. The default should be ‘opt in‘, not ‘opt out‘ (that is, select those you want to invite instead of unselect those you don’t wanto to invite).
So dear Christopher Cox and/or Chamath Palihapitiya at Facebook, while you will probably say that ‘but it is clearly written on the page that they’re about to send an invitation to (in my case, 1000??) contacts‘, you know that you are wrong on this one. You’re spamming. Big time, like real jerks. Since you’re probably not going to do anything about it, Google: any ideas?
I am currently using my experience with milonga.be to build a similar site for Tango in Bulgaria. One of the major components of the site is the tango calendar. In this case I have chosen not to use a special iCal visualisation tool (more on that later in a series posts on Tango2.0), but just the standard Google Calendar IFRAME-based widget.
It’s not a bad widget, but it’s too limited. You can only display “Day/Week/Month/Agenda” style, the colors and fonts are fixed and it does funny stuff for events that continue after 12:00AM (which tango events regularly do, believe me).
I’ve already talked about the fact that iCal is a sissy format and that Gcal needs some more features. I was just thinking that it would be nice if some company would jump on that and provide the whistles and bells for iCal/vCal feeds (like those of Google Calendar), just like Feedburner did with RSS/podcast feeds (and they got bought by Google, so maybe their idea wasn’t half bad). So I introduce the following concept: CalendarBurner (since the Calburner/iCalburner domains are taken).
Continue reading CalendarBurner: Feedburner for iCal calendars
Wouldn’t this be a great idea: the Brussels public transport not mapped by MIVB‘s horribly unpractical route planner, but by Google’s Transit maps. You just need to get an export of the stops, the routes and the times, and they can be shown on Google Maps just like that. Where should we start looking for the source data? Then create
agency.txt, stops.txt, routes.txt, trips.txt, stop_times.txt, calendar.txt and that’s it.
Google Transit Feed
I am using Google Agenda as the central repository for the milonga.be Belgian tango agenda, which I edit together with half a dozen other tango enthusiasts. While the principle of a central, hosted calendar storage works wonderfully, I (have to) use a modified PHPiCalendar to display different views on the agenda (‘only Brussels’, ‘only workshops’, ‘1 week in advance’, ‘1 month in advance’, …). There are actually a couple of features that I’d like to see in Google Agenda, and what better place to list them but here:
Currently an event in the agenda has the fields Title, Date/time (with recurrency, if any) , Location and Description. What I really miss is Tags (or categories, keywords, whatever you want to call them). Tags would allow me to attribute events to categories so that I can easily slice and dice them: only display the “milonga’s”, the events in Antwerp, the events in a specific place. Now I had to write a modified ‘filtered printable view’ for PHPiCalendar so that I can search on specific words in the event title, but that is really a hack. E.g. I now ask every editor to create the event titles as
“[TYPE]: [name of the event] @ [LOCATION]”
so that I can filter on “CONCERT:” or “@ Gent”. With the tags “concert, gent, polariteit, openair” it would be so much easier.
The iCalendar specification even mentions a ‘Categories’ field, although Google Agenda currently does not use it.
I have been working a bit on Google Maps visualisations for my milonga.be tango site, to show an overview of all Belgian tango sites. I did it the following way:
- I use Google Maps‘ “My Maps” to create a collection of pointers on a map. I called this map “milonga.be tango venues”. It’s not complete yet, but I have about 25 locations in it for the moment. I can easily link to this page so anyone can see it: Belgian tango venues.
IFRAME). I decided to use the frame approach and build a generic KML visualizor.
So how can you use it to show any KML/GeoRSS feed on your website?
- Go to the forret.com Google Maps visualizor tool
- Copy/paste the KML feed URL. Example 1: the KML link from Google Maps:
Example 2: Flickr feeds also have a Flickr GeoRSS format which is compatible (now also KML).
- Choose the appropriate center point. Currently you have to copy/paste it from Google Maps or another application, I still have to add some interface magic to do it in the page.
- Press “Show!” and copy/paste the resulting
IFRAME HTML code. Voila!
Continue reading Web tool: visualize on Google Maps
This is what a YouTube clip page looks like now:
And this is what YouTube is developing as a new ‘beta’ version:
- they got rid of the large banner (” Use Quicklists!”) which was basically a waste of space.
- all clip info is now under the video instead of to the right. All info on and action on a video are now in one place. I like it.
- Youtube likes you to check out other videos. Most newly freed space is dedicated to thumbnails of other (‘related’/’promoted’) clips: you now see 13 of them on the frist page instead of 5.
Resurrection of milonga.be
When I started dancing argentine tango, there were two sites that gave you an update of where and when you could dance tango in Belgium. The first one was tango.be, with a frame-based layout that I don’t find the most user-friendly nor visually pleasing, and the second www.milonga.be, with a Flash-based agenda that was quite easy to use. Unfortunately the editor of the latter had to stop the site due to lack of time. Two weeks ago I noticed that he had even let the domain name expire and it was free again. Five minutes later I was the new owner of milonga.be. My goal: to make it again into a comprehensive overview of where to take tango courses and dance tango in Belgium.
Oh, what can I say, I know WordPress so well now, I use it wherever I can. So yes, it’s a WordPress site, with the K2 template, but with (currently) only static pages and no posts. I’ve divided the site into 2 parts: where to follow classes, and where to go dancing (practicas, milongas, salons, workshop). I’m obviously going to sprinkle some Web2.0 gold dust on the project. One example of this: Google Calendar.
Continue reading Creating a tango calendar