Avoiding wiki spam in Mediawiki


The great thing about Wiki’s is that everyone can edit them. The problem is that this attracts a new strain of spam morons: the wiki spammers. My Tango Wiki has gotten spammed several times per day since I launched it. A page gets changed to a list of URLs for various drugs, mostly ‘male performance related’, let’s say.

A sample of the IP addresses of the offenders (basically all over the place):

  • 67.86.88.200 (optonline.net – Brooklyn, USA)
  • 80.58.4.46 (rima-tde.net – Madrid, Spain)
  • 80.58.50.107 (rima-tde.net – Madrid, Spain)
  • 81.56.56.80 (proxad.net – Paris, France)
  • 81.138.240.12 (btopenworld.com – Watford, UK)
  • 82.131.14.62 (starman.ee – Saue, Estonia)
  • 82.92.4.145 (xs4all.nl – Amsterdam, Netherlands)
  • 140.116.39.112 (ncku.edu.tw – Taipei, Taiwan)
  • 195.175.37.7 (ttnet.net.tr – Istanbul, Turkey)
  • 195.175.37.70 (ttnet.net.tr – Istanbul, Turkey)
  • 196.7.0.160 (alter.net – Cape Town, South Africa)
  • 210.55.18.80 (global-gateway.net.nz – Auckland, New-Zealand)
  • 212.138.47.21 (isu.net.sa – Riyadh, Saudi-Arabia)
  • 212.190.198.36 (uunet.be – Belgium)

They try to hide the spam by putting it inside a <div style="height: 1px"> (CSS hidden spam) so they are not visible to visitors, but get picked up by Google anyway. The goal, just as with splogs, is to create Google juice, not to get read or clicked on.

The ways to fight this abuse are based on the following techniques:

  • editor whitelist: only certain IP addresses, or only logged-in users, can edit the pages

  • editor blacklist: certain IP addresses are blocked (e.g. those of anonymous proxies – often used by spammers)
  • spam word detection: when the text contains certain words, the edit is not accepted
  • spam link detection: when outgoing links contain certain words, the edit is not accepted
  • rel=nofollow: makes the outgoing links valueless for Google

A lot of valuable information on fighting wiki spam can be found on: chongqed.org and on meta.wikimedia.org. What I have already done on my MediaWiki installation is to add the following to LocalSettings.php:

$wgShowIPinHeader = false;  # so no information on IP addresses can be added
$wgSpamRegex="/<div/";   # so the hidden CSS trick does not work
$wgWhitelistEdit = true;    # so only logged-in users can edit

Technorati:

4 thoughts on “Avoiding wiki spam in Mediawiki”

  1. Correct, I am using the 1.4.7 version.
    In v1.5 it becomes:

    # This replaces wgWhitelistAccount and wgWhitelistEdit
    #
    #
    $wgGroupPermissions = array();
    $wgGroupPermissions['*' ]['createaccount'] = false;
    $wgGroupPermissions['*' ]['read'] = true;
    $wgGroupPermissions['*' ]['edit'] = false;

  2. Hi Peter, I have now got my own MediaWiki going to play with and think I have come up with a better regex to block CSS Hidden Spam.


    $wgSpamRegex = "//";

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.