Archive for the 'Google' Category

Google Docs’ infamous “Moved Temporarily” error – fixed!


I store quite a lot of info in Google Spreadsheets, for the obvious reasons:

  • anyone can edit from any place, even at the same time
  • the servers are more reliable than a server at the office
  • I can use the info (with CSV/Excel export) in other programs through a web link

But there is a problem popping up at random moments with that last export or ‘publish’ functionality. Sometimes when you download the published link of a CSV export (through curl), you get an error ‘Moved Temporarily - The document has moved‘ with a redirect to a www.google.com address. And if you don’t follow HTTP 302 redirects, you can’t get to the actual content. In the past I’ve always worked around it or waited until the error went away, but today I searched a bit further. So for those who have the same question: read and learn!

The redirect is actually for authentication. Although I publish without requiring signing in, so one would expect no authentication process, there actually is one. See what it does (I used wget in verbose mode to get the HTTP headers):

>>>:~$ wget -v “https://spreadsheets.google.com/(…)&output=csv”
-- https://spreadsheets.google.com/(...)&output=csv

(...)

Location: https://www.google.com/... (first redirect)

-- https://www.google.com/(...)/ServiceLogin?=...

(…)

Location: https://spreadsheets.google.com/... (second redirect)

-- https://spreadsheets.google.com/(...)&output=csv&ndplr=1

(...)

Saving to: ...

So what is the solution: just add “&ndplr=1” to your URL and you will skip the authentication redirect. I’m not sure what the NDPLR parameter name stands for, let’s just call it: “Never Do Published Link Redirection“.

How to delete old data in Google Calendar

I use Google Calendar as a vital piece of milonga.be: me and some 20 other editors keep an up-to-date calendar of tango events in Belgium. We’ve been doing that for the last 3 years, so there was a lot of old, no-longer-relevant data in the agenda. The way I use the calendar on the site is that I download all the appointments as a .ICS (iCal/gCal) file and then format/display it with another program. But with all the old data still present, that ICS file had grown to more than 1MB, and this size slowed down the updates (I download the whole thing every 30 min). So I decided to delete all old data (2007 – 2009). Not that easy.

Google Calendar’s web interface doesn’t really allow you to bulk delete. There is no way to select several dozens of appointments and delete them in one go. But I found a way that works (suggested here):

  • Install Mozilla Thunderbird (desktop email client)
  • Install Mozilla Lightning (calendar plugin for Thunderbird)
  • Install Provider for Google Calendar (Gcal plugin for Lightning)
  • Look up the Google Calendar Private iCal URL of your calendar (something like http://www.google.com/calendar/ical/...%40group.calendar.google.com/private-.../basic.ics)
  • Add it to Thunderbird with FILE/NEW/CALENDAR/NETWORK/GOOGLE CALENDAR
  • You now have a read/write connection to your Google Calendar!

Select the appointments you want to delete, hit the ‘Del’ button and see them disappear one by one.

Newscorp is indeed dropping out of Google

The big disappearing act

When Rupert Murdoch announced that he would remove his sites from Google (in order to make a deal with Microsoft, so that only Bing would have the NewsCorp pages, as we now assume), he apparently wasn’t kidding. Although all Google web sites still indicate that e.g. MySpace has 179 million pages in the index, the Google API is currently returning another number for that: only 7 million. The total number of NewsCorp pages (a sum of MySpace, IGN, RottenTomatoes, …) has dropped from 192 million to 12 million.

Newscorp is dropping out of Google

(trend via http://trend.visualizor.com/g/1011 )

Which sites are Newscorp?

Let me give you some of his ‘big’ sites and how their # indexed pages have dropped:

  • Myspace: from 179 mio to 7 mio
  • RottenTomatoes: from 4 mio to 100.000
  • IGN: from 4 mio to 300.000
  • Stats.com: from 2.4 mio to 50.000
  • News.com.au: from 1.2 mio to 70.000
  • Sky.com: from 1.4 mio to 85.000

I suspect the Fox, National Geographic, Daily Telegraph, and other sites will soon follow.

Did he send in the robots?

I checked to see if NewsCorp finally started using the robots.txt file, because that’s the way you’re supposed to remove content from Google, not with press conferences.

Myspace:

User-agent: *
Disallow:

RottenTomatoes:

User-agent: Mediapartners-Google
Disallow:

And the answer there is “no”. So I’m not sure how they tell the Google crawler to stay out.

— UPDATE —

Source of the data:

The numbers come from http://tools.forret.com/newscorp/, which uses the Google Search API. I double-checked the replies from the API: for MySpace.com I get "estimatedResultCount": "6950000" so 7 million, not 179 million. If there’s an error, it’s in the Googleplex.

Facebook tricked me into my own spam FAIL

facebook spam

So I decided to let Facebook check my Gmail contact list to see if I had missed some contacts (people using aliases, etc …). After carefully selecting a couple of FB friends to invite (a buddy from the army, …), I clicked ‘Select’ and then ‘OK’ on the next screen that I supposed was a ‘Confirm’ window. I didn’t even read what was written on it. Some minutes later I saw emails starting to come in on different email aliases I had created in all my years of Internet activity. Apparently I allowed Facebook to send email messages to all Gmail contacts with email addresses that were not yet ‘known’ in Facebook. I have about 1500 addresses in my Gmail, let’s say some 500 already have a FB profile: so I just allowed Facebook to send out 1000 ‘unsollicited commercial emails’ or *spam* on my behalf. There is no way for me to know how many emails went out, nor to whom. I feel strongly embarrased, since I have been a strong opponent of spam for years, and since I have no idea who I have bothered with this bulk mail.

A company like Facebook probably has a whole team concentrated on user experience and workflow streamlining, so I can only assume that this strategy is by design. They probably have to keep the monthly exponential growth numbers so they use every opportunity to collect new email addresses. This is plain wrong. The default should be ‘opt in‘, not ‘opt out‘ (that is, select those you want to invite instead of unselect those you don’t wanto to invite).

So dear Christopher Cox and/or Chamath Palihapitiya at Facebook, while you will probably say that ‘but it is clearly written on the page that they’re about to send an invitation to (in my case, 1000??) contacts‘, you know that you are wrong on this one. You’re spamming. Big time, like real jerks. Since you’re probably not going to do anything about it, Google: any ideas?

http://www.google.com/support/forum/p/gmail/thread?tid=46004a5733eee4f0&hl=en

http://blogs.zdnet.com/social/?p=266

http://www.smartmobs.com/2007/09/02/facebook-friending-spam/