Archive for the 'spam' Category

Page 4 of 7

Who knows a spam pigeon?

Don't try this at home
I wrote about the economics of spam earlier:

P$ = [N * (I% * S% * W% * B% * M$)] – (N * E$) – (L% * C% * R$)
where
P$ = profit, bottom-line

N = number of emails sent (can be millions!)
I% = % of addresses that are valid/correct
S% = % of addresses that are not intercepted by anti-spam software
W% = % of emails to cause the receiver to go visit the website
B% = % of site visitors that actually buy the product
M$ = margin per product sold

E$ = cost of sending 1 email

L% = risk of having legal action taken against you
C% = risk of getting convicted when you’re in court
R$ = average fine you would have to pay

What I often ask wonder about is the B%: people that actually buy the product. People actually order blue pills upon receiving emails like the one above? People believe they have been chosen to transfer millions of dollars out of some banana-republic?
Continue reading ‘Who knows a spam pigeon?’

Why spam opt-out lists won’t work


I was reading about a technique to discourage spammers: let an organised mob fill in thousands of fake submissions so that there is no way telling how to distinguish them from real responses. They targeted a known spammer, Alex Polyakov, currently #8 in Spamhaus top 10 and he did feel the pain.

During the 13-minute call, Polyakov claims that his “interest is only to make honest dollars.” As a peace offering, Polyakov proposes to create a global opt-out list, “the anti list of all anti lists.” Polyakov says he has no interest in sending spam to people who don’t want to receive it, and he guarantees that he will persuade all his spam-business associates to clean their mailing lists.
from Spamkings blog via digg.com

Let’s consider such a global opt-out list:
DISTRIBUTED OPT-OUT LIST

  • let’s say it would be something like 1 million addresses (just a ballpark figure). All in lower case, with no funny characters.
  • In order to make sure the list is not used as a spamming list itself (since these guys are not known for playing by the rules), it should be communicated not as email addresses, but as a list of hashes (e.g. MD5/SHA-1) of email addresses. (Which means you cannot get back the email addresses from the hash)
  • SHA-1 is 160 bits or 20 bytes per address. MD5 is 128 bits or 16 bytes per address. MD5 is less secure but for this purpose, who cares (false positives are not a big issue).
  • The size of the list would be 16 bytes x 1 million = 16MB, which is manageable for daily/weekly updates.
  • One could accept domain wildcards (*@example.com) but since Hotmail, Yahoo, Gmail … would want to add a wildcard for their users, this would kill the spammers’ lists so no one would use it. Plus, some people might object to the fact that they are not kept up-to-date with the latest Ci@lis/Vi@gr@ prices.
  • Let’s say a spammer would use a 100-million addresses target list. This means 100 million emails of something like 30 bytes on average (high estimate, I know). So he would need to calculate the MD5 for 100.000.000 x 30 bytes or 3GB. Looking at some MD5 throughput stats (20MB/s) this is a matter of minutes, not hours.
  • Then the spammer has to remove all addresses that feature in the opt-out list. This can easily be done as a merge of 2 sorted lists. The overhead is negligible.
  • If the opt-out list grows to 100 mio addresses, and the size to 1.6 GB, download is still done in less than 1 hour over ADSL.
  • HOWEVER: dictionary attack! I am ruthless spammer and I just got a list of 1 million hashes? Mmm … I could create a dictionary of probable email addresses and see if they actually exist! An email consist of the letters [a-z], numbers [0-9] and the characters [-._] before the ‘@’ sign. So all combinations up to 10 chars are around 40^10 (gross simplification, I know) or 10^16, and if I filter out the incorrect ones (44444444444@) and use the billion most probable ones (e.g. “jill.jackson@” is more probable than “a77..-_-8@”), combined with the postfixes hotmail.com, yahoo.com, comcast.com, … I could probably find some addresses of notorious anti-spammers, send them loads of email and destroy the credibility of the opt-out list immediately.

EMAIL SERVICE PROVIDER

  • someone that sends email on behalf of spammers, that always uses the opt-out list, and that because of this admirable behaviour gets treated more leniently by anti-spam software.
  • Advantage: the opt-out list never has to be sent to spammers, and no mails go to the opt-outers.
  • Disadvantage: ain’t never gonna happen. Spammers would have to pay for this service and they won’t, the service would have to be operated by a trusted 3rd party but who would want to do that?

SELF-REGULATION
The American Direct Marketing Association (DMA) has the e-Mail Preference Service (e-MPS), the Belgian Direct Marketing Association has the Robinson-list. As I recall from my Direct Marketing days, the Robinson list was always used to clean up addresses.

But getting the emailers in the DMA to use a global opt-out list will only help very little. They’re not the real problem. The real problem are the Russian/American vilains on the Spamhaus top 10.

Conclusion
I would have to agree with Spamhaus:

1. For-a-fee Address Remove Lists are operated by conmen.
2. No legitimate marketing firm sends Unsolicited Bulk Email in the first place.
3. Can you imagine spammers doing this?
4. All spammers believe their junk is different from the junk other spammers send.
from spamhaus.org

Technorati:

Blogspot splogs in Technorati

For some reason, if I search for “baeyens” on Technorati (sorry, John), all I get is a list of splogs (spam blogs). The first ‘real’ result is somewhere at #50, drowned between WEBCAM, CAMERA and PHONE CARD splogs.

Technorati splog results

They all have the same characteristics:

  • all on Blogger‘s blogspot.com
  • post title is up to three spam words in upper case
  • blog title is up to three spam words in lower case
  • blog post contents is a sequence of words without any meaning (apparently ‘baeyens’ has become part of a standard splog dictionary)
  • at the end of the blog post is an iframe part
  • the iframe inserts code from www.webs-search.com in the page that also redirects the browser to e.g. http://www.webs-search.com/search.php?key=guns (if the blog topic was ‘guns’)
  • that page is filled with ads that go through www.peakclick.com, an Austrian PPC site

What I mean is: Dave, you guys should be able to filter this scum out! And Matt, can’t you give the Blogger team a hand in attacking the splog problem from their side? We don’t want Technorati installing a if (domain ends in "blogspot.com") {/* treat as splog */ ... } rule, do we? Or do we?

webs-search.com

  • domain is registered by a “Anrev, Kovacz contact@mwayc.com – 1003 Star Street – Novambark, na 88737363 – CA”
  • the registration address for mwayc.com is “41 State Street – New York, NY 12345 – US”
  • domain is hosted on an EV1 server: ev1s-67-15-104-73.ev1servers.net [67.15.104.73]
  • the page title is ‘Licht und tonanlage’, which could mean that either the above Kovacz speaks German, or -more probable- that the site’s code was delivered by the Austrian PPC site.

Technorati:

Avoiding wiki spam in Mediawiki


The great thing about Wiki’s is that everyone can edit them. The problem is that this attracts a new strain of spam morons: the wiki spammers. My Tango Wiki has gotten spammed several times per day since I launched it. A page gets changed to a list of URLs for various drugs, mostly ‘male performance related’, let’s say.

A sample of the IP addresses of the offenders (basically all over the place):

  • 67.86.88.200 (optonline.net – Brooklyn, USA)
  • 80.58.4.46 (rima-tde.net – Madrid, Spain)
  • 80.58.50.107 (rima-tde.net – Madrid, Spain)
  • 81.56.56.80 (proxad.net – Paris, France)
  • 81.138.240.12 (btopenworld.com – Watford, UK)
  • 82.131.14.62 (starman.ee – Saue, Estonia)
  • 82.92.4.145 (xs4all.nl – Amsterdam, Netherlands)
  • 140.116.39.112 (ncku.edu.tw – Taipei, Taiwan)
  • 195.175.37.7 (ttnet.net.tr – Istanbul, Turkey)
  • 195.175.37.70 (ttnet.net.tr – Istanbul, Turkey)
  • 196.7.0.160 (alter.net – Cape Town, South Africa)
  • 210.55.18.80 (global-gateway.net.nz – Auckland, New-Zealand)
  • 212.138.47.21 (isu.net.sa – Riyadh, Saudi-Arabia)
  • 212.190.198.36 (uunet.be – Belgium)

They try to hide the spam by putting it inside a <div style="height: 1px"> (CSS hidden spam) so they are not visible to visitors, but get picked up by Google anyway. The goal, just as with splogs, is to create Google juice, not to get read or clicked on.

The ways to fight this abuse are based on the following techniques:

  • editor whitelist: only certain IP addresses, or only logged-in users, can edit the pages

  • editor blacklist: certain IP addresses are blocked (e.g. those of anonymous proxies – often used by spammers)
  • spam word detection: when the text contains certain words, the edit is not accepted
  • spam link detection: when outgoing links contain certain words, the edit is not accepted
  • rel=nofollow: makes the outgoing links valueless for Google

A lot of valuable information on fighting wiki spam can be found on: chongqed.org and on meta.wikimedia.org. What I have already done on my MediaWiki installation is to add the following to LocalSettings.php:

$wgShowIPinHeader = false;  # so no information on IP addresses can be added
$wgSpamRegex="/<div/";   # so the hidden CSS trick does not work
$wgWhitelistEdit = true;    # so only logged-in users can edit

Technorati: