Avoiding wiki spam in Mediawiki
01 Nov 2005
The great thing about Wiki’s is that everyone can edit them. The problem is that this attracts a new strain of spam morons: the wiki spammers. My Tango Wiki has gotten spammed several times per day since I launched it. A page gets changed to a list of URLs for various drugs, mostly ‘male performance related’, let’s say.
A sample of the IP addresses of the offenders (basically all over the place):
- 67.86.88.200 (optonline.net – Brooklyn, USA)
- 80.58.4.46 (rima-tde.net – Madrid, Spain)
- 80.58.50.107 (rima-tde.net – Madrid, Spain)
- 81.56.56.80 (proxad.net – Paris, France)
- 81.138.240.12 (btopenworld.com – Watford, UK)
- 82.131.14.62 (starman.ee – Saue, Estonia)
- 82.92.4.145 (xs4all.nl – Amsterdam, Netherlands)
- 140.116.39.112 (ncku.edu.tw – Taipei, Taiwan)
- 195.175.37.7 (ttnet.net.tr – Istanbul, Turkey)
- 195.175.37.70 (ttnet.net.tr – Istanbul, Turkey)
- 196.7.0.160 (alter.net – Cape Town, South Africa)
- 210.55.18.80 (global-gateway.net.nz – Auckland, New-Zealand)
- 212.138.47.21 (isu.net.sa – Riyadh, Saudi-Arabia)
- 212.190.198.36 (uunet.be – Belgium)
They try to hide the spam by putting it inside a <div style="height: 1px">
(CSS hidden spam) so they are not visible to visitors, but get picked up by Google anyway. The goal, just as with splogs, is to create Google juice, not to get read or clicked on.
The ways to fight this abuse are based on the following techniques:
- editor whitelist: only certain IP addresses, or only logged-in users, can edit the pages</p>
- editor blacklist: certain IP addresses are blocked (e.g. those of anonymous proxies – often used by spammers)
- spam word detection: when the text contains certain words, the edit is not accepted
- spam link detection: when outgoing links contain certain words, the edit is not accepted
-
rel=nofollow: makes the outgoing links valueless for Google</ul> A lot of valuable information on fighting wiki spam can be found on: chongqed.org and on meta.wikimedia.org. What I have already done on my MediaWiki installation is to add the following to
LocalSettings.php
:$wgShowIPinHeader = false; # so no information on IP addresses can be added $wgSpamRegex="/<div/"; # so the hidden CSS trick does not work $wgWhitelistEdit = true; # so only logged-in users can edit
-
- spam link detection: when outgoing links contain certain words, the edit is not accepted
- spam word detection: when the text contains certain words, the edit is not accepted
- editor blacklist: certain IP addresses are blocked (e.g. those of anonymous proxies – often used by spammers)
</pre>
Technorati: <a href="http://technorati.com/tag/wiki" rel="tag">wiki</a> – <a href="http://technorati.com/tag/spam" rel="tag">spam</a> – <a href="http://technorati.com/tag/chongqed" rel="tag">chongqed</a>