schlitt.info - php, photography and private stuff ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :Author: Tobias Schlitt :Date: Wed, 19 Nov 2008 23:29:46 +0100 :Revision: 1 :Copyright: CC by-nc-sa ========================== Thoughts on trackback spam ========================== :Description: It's been a long while since I worked on my PEAR package Services_Trackback, mainly because I was much too busy with work and university. Nevertheless I made up my mind about how to solve the problem of the so-called trackback spam. In email environments people search for a solution to spam since email was invented and by now, no satisfactory solution was found (AFAIK, please correct me, if I'm wrong, I would be thrilled). Approaches here include complicated techniques like heuristic algorithms and easy ones like grey-listing, as well as sender identification (which is useless by now, since no unique standard exists and almost no program supports it). It's been a long while since I worked on my PEAR package `Services_Trackback`__, mainly because I was much too busy with work and university. Nevertheless I made up my mind about how to solve the problem of the so-called trackback spam. In email environments people search for a solution to spam since email was invented and by now, no satisfactory solution was found (AFAIK, please correct me, if I'm wrong, I would be thrilled). Approaches here include complicated techniques like `heuristic algorithms`__ and easy ones like `grey-listing`__, as well as `sender identification`__ (which is useless by now, since no unique standard exists and almost no program supports it). .. __: http://pear.php.net/package/Services_Trackback .. __: http://spamassassin.apache.org/ .. __: http://www.greylisting.org/ .. __: http://www.microsoft.com/mscorp/safety/technologies/senderid/default.mspx The approach of heuristics might be also feasible for trackbacks, but will also give no satisfactory results there. Sadly spammers are much more creative than computer programs. The grey-listing approach is completely unfeasible, since trackbacks are used by web applications which mostly don't have the ability to retry a request after a certain amount of time. Whereas the idea of identifying the sender looks promising to me. Taking for granted, that the idea should work, there are 2 main questions to answer: 1. How can a sender of a trackback be identified? 2. If and how must the trackback standard be changed to support the identification? For question #1 there is a simple answer (IMHO): `PGP`__/`GPG`__ (further on referred to as GPG, for simplicity). The infrastructure of signing data using a GPG is already in place and has quite a lot of benefits regarding authentication of a sender and trust relationships. The major benefits for using this technique to identify the sender of a trackback are: .. __: http://www.pgpi.org/ .. __: http://www.gnupg.org/ - GPG is usable on any platform and any kind of web language is able to interact with it (either through an extension or simple by calling it on the shell). - GPG provides the signing of data in plain text. - The GPG infrastructure is widely spread under technically interested people. - GPG will allow trackback-enabled applications to build structures like "trust senders which have a key", "trust senders which have a key signed by the recipient" or "trust senders which have a key signed by XYZ", which gives a high amount of flexibility. I think GPG signing would be a generally good choice for identifying the sender of a trackback sufficiently. Question #2 is a bit more difficult to answer. While the `trackback standard`__ is crappy and not really good thought out, it contains all necessary (even if not always all desirable) information for its purpose. More important: It is extremely wide spread in the weblog scene and already taken over by a lot of other web sites. Changing this standard would result in almost the same chaos than the approach of changing the email standard for sender identification. .. __: http://www.sixapart.com/pronet/docs/trackback_spec But since the trackback standard set's on simple HTTP POST submission of the data, the standard can simply be enhanced by a "signature" field: A receiving application can simply use the signature field (if it is already enable to take care of signatures) or will automatically ignore it, since it does not even know about it. The trackback standard also allows the free-form definition of an error message, as the response, so sites which are not yet enabled of signing their trackbacks, can easily be informed of the lag. For a transition period, not-signed trackbacks can still simply require moderation, while signed trackbacks can automatically be accepted, or (more capable for large sites) can be ignored with an error, that only signed trackbacks are accepted. I don't think I thought about every single case and I'm sure there will occur pitfalls with this idea, but in general I'm sure it can be a way to go and to successfully fight trackback spam, because: - No spammer will want to provide it's real identity to the spammed person. - Even if a spammer is that stupid, using the trusting relations, spamming can be made impossible. Before I start wildly hacking and implementing this easy solution for my trackback class and the blog software I'm using, I'd like to hear some opinions from you out there. What do you think about the approach? Is there already something similar, which I maybe missed? What problems do you see? **Curious for feedback! :)** .. Local Variables: mode: rst fill-column: 79 End: vim: et syn=rst tw=79 Trackbacks ========== - Trackback spam on Mon, 13 Feb 2006 05:45:41 +0100 in blogs for industry I haven't updated the war on spam in a while...overall, the blockers have been working very well, but there's been an increase in the Google ad-based spam. I rotated the blocked trackbacks logs...and there were two blocks within the first 5 minutes. ... Comments ======== - Aaron Wormus at Tue, 07 Feb 2006 08:02:02 +0100 It's a good idea but too complicated to be practical. To be practical you would need to use your current GPG key, so that you can take advantage of the ring of trust that is already established. Then what happens if "some guy" wants to send a trackback to your website. I may trust him enough to allow him to send me a trackback, but I don't trust him enough to sign his key. Why won't a domain whitelist approach work? Trackbacks are dead! Long live trackbacks! - Toby at Tue, 07 Feb 2006 08:33:26 +0100 You don't really need to use your own key, establishing a new key extra for trackback purposes won't be that much work. Using this key, you could even trust persons you normally should not trust with your email key. Beside that, the simple availability of a key, that is trusted by some people should do the deal. The time taken by signing a message (if you e.g. include your own IP or even better, the URL the post is sent too) into the signed text, should be sufficient to scary every spam host to death (imagine how long it takes to calculate the signature for each trackback to send). Or am I wrong here? - Oscar Merida at Tue, 07 Feb 2006 16:27:51 +0100 My first attempt at dealing with trackback spam was using a domain blacklists, which quickly became unmaintainable as the domain names changed fairly quickly. A whitelist would definitely by more maintainable, and you don't have to deal with the GPG part. Of course, the real value for me of trackback is in finding new, unexpected sites linking back to my site. I've been using RBL and SURBL to check the IP address and content of trackback posts and that's been the most effective at blocking the majority of trackback spam on my blog. - fa at Tue, 07 Feb 2006 17:45:47 +0100 On first glance it seems like a good idea, but there are several topics that I either haven't really figured out or think do not apply: - Trackback spammers normally do not auth at all, so how do you separate good trackbacks from bad trackbacks at the first time? I'd says you can't - so you have exact the same work if person x from blog y trackbacks you for the first time - Security: You have to put your _private_ key onto your webserver. hopefully out of docroot, but I wouldn't bet that all do this. What if the server gets compromised (a lot more probable than my home box with dynamic ip behind my firewall) - Amount of work: you have to make a new gpg key, organize it, sign people, etc. Most people won't do this for their more important primary gpg-key - Administration of group blogs - does only the admin/main blogger whitelist the keys? what if you don't trust him enough to have _your_ private key on _his_ server OK, one pro I can't see yet: If you do several blogs, you have only one key. But there are not too many people. To sum it up, I'd say this has too many drawbacks, gpg is just not right for this. But I may be a bit paranoid... *reposted because of strange things happening*