The big disappearing act
When Rupert Murdoch announced that he would remove his sites from Google (in order to make a deal with Microsoft, so that only Bing would have the NewsCorp pages, as we now assume), he apparently wasn’t kidding. Although all Google web sites still indicate that e.g. MySpace has 179 million pages in the index, the Google API is currently returning another number for that: only 7 million. The total number of NewsCorp pages (a sum of MySpace, IGN, RottenTomatoes, …) has dropped from 192 million to 12 million.
(trend via http://trend.visualizor.com/g/1011 )
Which sites are Newscorp?
Let me give you some of his ‘big’ sites and how their # indexed pages have dropped:
- Myspace: from 179 mio to 7 mio
- RottenTomatoes: from 4 mio to 100.000
- IGN: from 4 mio to 300.000
- Stats.com: from 2.4 mio to 50.000
- News.com.au: from 1.2 mio to 70.000
- Sky.com: from 1.4 mio to 85.000
I suspect the Fox, National Geographic, Daily Telegraph, and other sites will soon follow.
Did he send in the robots?
I checked to see if NewsCorp finally started using the robots.txt
file, because that’s the way you’re supposed to remove content from Google, not with press conferences.
Myspace:
User-agent: * Disallow:
RottenTomatoes:
User-agent: Mediapartners-Google Disallow:
And the answer there is “no”. So I’m not sure how they tell the Google crawler to stay out.
— UPDATE —
Source of the data:
The numbers come from http://tools.forret.com/newscorp/, which uses the Google Search API. I double-checked the replies from the API: for MySpace.com I get "estimatedResultCount": "6950000"
so 7 million, not 179 million. If there’s an error, it’s in the Googleplex.
It’s a fake graph, or a bug.
It might be a bug in the Google Search API results. But the graph is not fake.
Could this have anything to do with it..?
http://www.seomoz.org/blog/googles-indexation-cap
Have you been monitoring any non-Newscorp sites a “control” sample, to make sure it’s not just a new algorithm from Google?
Interesting chart – going from 192m to 12m is significant. I wonder how difficult it is going to be to do the necessary work when they decide to go in the opposite direction…
This decision is going to simultaneously destroy and create careers – and further, at some point the careers that the decision creates (in the beginning) will ultimately be destroyed when the decision is ultimately reversed.
A few of us at the Googleplex are checking this out and not seeing anything very unusual on our end (search results estimates are just that–estimates). As Paul Lomax asked, what sort of control sample are you using of regular sites?
the estimatedResultCount is “a rough estimate and is not
reliable enough to be used for research/exact calculations. It is subject to frequent change as our index changes.”
http://ferodynamics.com/google-delists-myspace/
@Matt: I have made a comparison between the numbers coming back from the API and the ‘normal’ website:
http://tools.forret.com/newscorp/reference.php
For a few web sites (e.g. Yahoo!) the numbers are the same. For most of them they are hugely different.