Fighting trackback spam on PEARWeb - Blog - Open Source - - php, photography and private stuff

Fighting trackback spam on PEARWeb

Trackback spammers only needed a couple of weeks to discover PEARWeb's new trackback feature for their purposes. Of course I had to do something against that, which sadly lasted very long because I've been much too busy. Last week I finally managed to release Services_Trackback 0.5.0 and migrated the PEARWeb code to the new version. Until now, trackback spam seems to be stopped. Ususally there was a huge number new spam trackbacks on PEARWeb per day, since last Tuesday there was not even 1.

Since the new spam check features in Services_Trackback have some of the most common methods against trackback spam bundled, the process PEARWeb performs to check for spam is very simple:

$trackback->createSpamCheck('Wordlist'); $trackback->createSpamCheck('DNSBL'); $trackback->createSpamCheck('SURBL'); $res = $trackback->checkSpam(); if ($res) { echo Services_Trackback::getResponseError('Your trackback seems to be spam. If it is not, please contact the webmaster of this site.', 1); exit; }

The first 3 lines of code create 3 spam check modules and add them to the trackback recently received. Services_Trackback would support to add checks each with a different priority to select the order of their execution, I don't do that here, so they get executed in the same order they were added. All checks are used with their default configuration.

The checkSpam() method checks each of the added spam checks sequentially, stops if one of the reports spam and returns true (the trackback is considered spam). If none of the modules indicate spam, checkSpam() returns false.

What this means exactly is, that first the trackback is scanned using the integrated "bad word list", which should filter almost 80% of the trackback spam. Next the host sending the trackback is checked against the Spamcop DNS blacklist, which should clean up another 5-10% of the spam trackbacks. Last a SURBL check is performed, checking the links contained in the trackback. This should (hopefully) catch the rest of trackback spam from comming through. Since the spam check stops if one module indicates spam, the waste of resources is pretty low (only if a trackback passes the Wordlist filter the more resource expensive DNSBL is performed and only if this is passed, the even more expensive SURBL checks are performed).

I'm currently very confident, that those checks will keep away most trackback spam from PEARWeb. Nevertheless, I'm interessted in implementing (maybe cheaper) spam protection methods. So, anyone out there any feedback/ideas on Services_Trackback, it's spam protection or similar?

If you liked this blog post or learned something, please consider using flattr to contribute back: .



Add new comment

Fields with bold names are mandatory.