Why spam opt-out lists won’t work24 Jan 2006
I was reading about a technique to discourage spammers: let an organised mob fill in thousands of fake submissions so that there is no way telling how to distinguish them from real responses. They targeted a known spammer, Alex Polyakov, currently #8 in Spamhaus top 10 and he did feel the pain.
During the 13-minute call, Polyakov claims that his “interest is only to make honest dollars.” As a peace offering, Polyakov proposes to create a global opt-out list, “the anti list of all anti lists.” Polyakov says he has no interest in sending spam to people who don’t want to receive it, and he guarantees that he will persuade all his spam-business associates to clean their mailing lists.
from Spamkings blog via digg.com
Let’s consider such a global opt-out list:
DISTRIBUTED OPT-OUT LIST
- let’s say it would be something like 1 million addresses (just a ballpark figure). All in lower case, with no funny characters.
- In order to make sure the list is not used as a spamming list itself (since these guys are not known for playing by the rules), it should be communicated not as email addresses, but as a list of hashes (e.g. MD5/SHA-1) of email addresses. (Which means you cannot get back the email addresses from the hash)
- SHA-1 is 160 bits or 20 bytes per address. MD5 is 128 bits or 16 bytes per address. MD5 is less secure but for this purpose, who cares (false positives are not a big issue).
- The size of the list would be 16 bytes x 1 million = 16MB, which is manageable for daily/weekly updates.
- One could accept domain wildcards (*@example.com) but since Hotmail, Yahoo, Gmail … would want to add a wildcard for their users, this would kill the spammers’ lists so no one would use it. Plus, some people might object to the fact that they are not kept up-to-date with the latest Ci@lis/Vi@gr@ prices.
- Let’s say a spammer would use a 100-million addresses target list. This means 100 million emails of something like 30 bytes on average (high estimate, I know). So he would need to calculate the MD5 for 100.000.000 x 30 bytes or 3GB. Looking at some MD5 throughput stats (20MB/s) this is a matter of minutes, not hours.
- Then the spammer has to remove all addresses that feature in the opt-out list. This can easily be done as a merge of 2 sorted lists. The overhead is negligible.
- If the opt-out list grows to 100 mio addresses, and the size to 1.6 GB, download is still done in less than 1 hour over ADSL.
- HOWEVER: dictionary attack! I am ruthless spammer and I just got a list of 1 million hashes? Mmm … I could create a dictionary of probable email addresses and see if they actually exist! An email consist of the letters [a-z], numbers [0-9] and the characters [-._] before the ‘@’ sign. So all combinations up to 10 chars are around 40^10 (gross simplification, I know) or 10^16, and if I filter out the incorrect ones (44444444444@) and use the billion most probable ones (e.g. “jill.jackson@” is more probable than “a77..-_-8@”), combined with the postfixes hotmail.com, yahoo.com, comcast.com, … I could probably find some addresses of notorious anti-spammers, send them loads of email and destroy the credibility of the opt-out list immediately.
EMAIL SERVICE PROVIDER
- someone that sends email on behalf of spammers, that always uses the opt-out list, and that because of this admirable behaviour gets treated more leniently by anti-spam software.
- Advantage: the opt-out list never has to be sent to spammers, and no mails go to the opt-outers.
- Disadvantage: ain’t never gonna happen. Spammers would have to pay for this service and they won’t, the service would have to be operated by a trusted 3rd party but who would want to do that?
The American Direct Marketing Association (DMA) has the e-Mail Preference Service (e-MPS), the Belgian Direct Marketing Association has the Robinson-list. As I recall from my Direct Marketing days, the Robinson list was always used to clean up addresses.
But getting the emailers in the DMA to use a global opt-out list will only help very little. They’re not the real problem. The real problem are the Russian/American vilains on the Spamhaus top 10.
I would have to agree with Spamhaus:
- For-a-fee Address Remove Lists are operated by conmen.
- No legitimate marketing firm sends Unsolicited Bulk Email in the first place.
- Can you imagine spammers doing this?
- All spammers believe their junk is different from the junk other spammers send.