The great thing about Wiki’s is that everyone can edit them. The problem is that this attracts a new strain of spam morons: the wiki spammers. My Tango Wiki has gotten spammed several times per day since I launched it. A page gets changed to a list of URLs for various drugs, mostly ‘male performance related’, let’s say.
A sample of the IP addresses of the offenders (basically all over the place):
- 67.86.88.200 (optonline.net – Brooklyn, USA)
- 80.58.4.46 (rima-tde.net – Madrid, Spain)
- 80.58.50.107 (rima-tde.net – Madrid, Spain)
- 81.56.56.80 (proxad.net – Paris, France)
- 81.138.240.12 (btopenworld.com – Watford, UK)
- 82.131.14.62 (starman.ee – Saue, Estonia)
- 82.92.4.145 (xs4all.nl – Amsterdam, Netherlands)
- 140.116.39.112 (ncku.edu.tw – Taipei, Taiwan)
- 195.175.37.7 (ttnet.net.tr – Istanbul, Turkey)
- 195.175.37.70 (ttnet.net.tr – Istanbul, Turkey)
- 196.7.0.160 (alter.net – Cape Town, South Africa)
- 210.55.18.80 (global-gateway.net.nz – Auckland, New-Zealand)
- 212.138.47.21 (isu.net.sa – Riyadh, Saudi-Arabia)
- 212.190.198.36 (uunet.be – Belgium)
They try to hide the spam by putting it inside a <div style="height: 1px">
(CSS hidden spam) so they are not visible to visitors, but get picked up by Google anyway. The goal, just as with splogs, is to create Google juice, not to get read or clicked on.
The ways to fight this abuse are based on the following techniques:
- editor whitelist: only certain IP addresses, or only logged-in users, can edit the pages
- editor blacklist: certain IP addresses are blocked (e.g. those of anonymous proxies – often used by spammers)
- spam word detection: when the text contains certain words, the edit is not accepted
- spam link detection: when outgoing links contain certain words, the edit is not accepted
- rel=nofollow: makes the outgoing links valueless for Google
A lot of valuable information on fighting wiki spam can be found on: chongqed.org and on meta.wikimedia.org. What I have already done on my MediaWiki installation is to add the following to LocalSettings.php
:
$wgShowIPinHeader = false; # so no information on IP addresses can be added $wgSpamRegex="/<div/"; # so the hidden CSS trick does not work $wgWhitelistEdit = true; # so only logged-in users can edit
This has changed in wiki 1.5 and no longer works as shown. (and the open source community wonders why they are not taken seriously in the professional world).
See
http://meta.wikimedia.org/wiki/Access_Restrictions
Correct, I am using the 1.4.7 version.
In v1.5 it becomes:
# This replaces wgWhitelistAccount and wgWhitelistEdit
#
#
$wgGroupPermissions = array();
$wgGroupPermissions['*' ]['createaccount'] = false;
$wgGroupPermissions['*' ]['read'] = true;
$wgGroupPermissions['*' ]['edit'] = false;
Hi Peter, I have now got my own MediaWiki going to play with and think I have come up with a better regex to block CSS Hidden Spam.
$wgSpamRegex = "//";
Please see http://wiki.evernex.com/index.php?title=Blocking_Spam_in_Mediawiki for a HOWTO on blocking spam in MediaWiki.
I keep it updated fairly often and run a fairly active small wiki on which I’ve successfully been able to go back to an open edit (no account required) policy. Successful spam attacks in the last 6 months: 0.