Newscorp is indeed dropping out of Google01 Dec 2009
The big disappearing act
When Rupert Murdoch announced that he would remove his sites from Google (in order to make a deal with Microsoft, so that only Bing would have the NewsCorp pages, as we now assume), he apparently wasn’t kidding. Although all Google web sites still indicate that e.g. MySpace has 179 million pages in the index, the Google API is currently returning another number for that: only 7 million. The total number of NewsCorp pages (a sum of MySpace, IGN, RottenTomatoes, …) has dropped from 192 million to 12 million.
Which sites are Newscorp?
Let me give you some of his ‘big’ sites and how their # indexed pages have dropped:
- Myspace: from 179 mio to 7 mio
- RottenTomatoes: from 4 mio to 100.000
- IGN: from 4 mio to 300.000
- Stats.com: from 2.4 mio to 50.000
- News.com.au: from 1.2 mio to 70.000
- Sky.com: from 1.4 mio to 85.000
I suspect the Fox, National Geographic, Daily Telegraph, and other sites will soon follow.
Did he send in the robots?
I checked to see if NewsCorp finally started using the
robots.txt file, because that’s the way you’re supposed to remove content from Google, not with press conferences.
User-agent: * Disallow:
User-agent: Mediapartners-Google Disallow:
And the answer there is “no”. So I’m not sure how they tell the Google crawler to stay out.
— UPDATE —
Source of the data:
The numbers come from http://tools.forret.com/newscorp/, which uses the Google Search API. I double-checked the replies from the API: for MySpace.com I get
"estimatedResultCount": "6950000" so 7 million, not 179 million. If there’s an error, it’s in the Googleplex.