It's been a long while since I worked on my PEAR package Services_Trackback, mainly because I was much too busy with work and university. Nevertheless I made up my mind about how to solve the problem of the so-called trackback spam. In email environments people search for a solution to spam since email was invented and by now, no satisfactory solution was found (AFAIK, please correct me, if I'm wrong, I would be thrilled). Approaches here include complicated techniques like heuristic algorithms and easy ones like grey-listing, as well as sender identification (which is useless by now, since no unique standard exists and almost no program supports it).
The approach of heuristics might be also feasible for trackbacks, but will also give no satisfactory results there. Sadly spammers are much more creative than computer programs. The grey-listing approach is completely unfeasible, since trackbacks are used by web applications which mostly don't have the ability to retry a request after a certain amount of time. Whereas the idea of identifying the sender looks promising to me. Taking for granted, that the idea should work, there are 2 main questions to answer:
How can a sender of a trackback be identified?
If and how must the trackback standard be changed to support the identification?
For question #1 there is a simple answer (IMHO): PGP/GPG (further on referred to as GPG, for simplicity). The infrastructure of signing data using a GPG is already in place and has quite a lot of benefits regarding authentication of a sender and trust relationships. The major benefits for using this technique to identify the sender of a trackback are:
GPG is usable on any platform and any kind of web language is able to interact with it (either through an extension or simple by calling it on the shell).
GPG provides the signing of data in plain text.
The GPG infrastructure is widely spread under technically interested people.
GPG will allow trackback-enabled applications to build structures like "trust senders which have a key", "trust senders which have a key signed by the recipient" or "trust senders which have a key signed by XYZ", which gives a high amount of flexibility.
I think GPG signing would be a generally good choice for identifying the sender of a trackback sufficiently.
Question #2 is a bit more difficult to answer. While the trackback standard is crappy and not really good thought out, it contains all necessary (even if not always all desirable) information for its purpose. More important: It is extremely wide spread in the weblog scene and already taken over by a lot of other web sites. Changing this standard would result in almost the same chaos than the approach of changing the email standard for sender identification.
But since the trackback standard set's on simple HTTP POST submission of the data, the standard can simply be enhanced by a "signature" field: A receiving application can simply use the signature field (if it is already enable to take care of signatures) or will automatically ignore it, since it does not even know about it. The trackback standard also allows the free-form definition of an error message, as the response, so sites which are not yet enabled of signing their trackbacks, can easily be informed of the lag. For a transition period, not-signed trackbacks can still simply require moderation, while signed trackbacks can automatically be accepted, or (more capable for large sites) can be ignored with an error, that only signed trackbacks are accepted.
I don't think I thought about every single case and I'm sure there will occur pitfalls with this idea, but in general I'm sure it can be a way to go and to successfully fight trackback spam, because:
No spammer will want to provide it's real identity to the spammed person.
Even if a spammer is that stupid, using the trusting relations, spamming can be made impossible.
Before I start wildly hacking and implementing this easy solution for my trackback class and the blog software I'm using, I'd like to hear some opinions from you out there. What do you think about the approach? Is there already something similar, which I maybe missed? What problems do you see?
Curious for feedback! :)
It's a good idea but too complicated to be practical.
To be practical you would need to use your current GPG key, so that you can take advantage of the ring of trust that is already established. Then what happens if "some guy" wants to send a trackback to your website. I may trust him enough to allow him to send me a trackback, but I don't trust him enough to sign his key.
Why won't a domain whitelist approach work?
Trackbacks are dead! Long live trackbacks!
You don't really need to use your own key, establishing a new key extra for trackback purposes won't be that much work. Using this key, you could even trust persons you normally should not trust with your email key.
Beside that, the simple availability of a key, that is trusted by some people should do the deal. The time taken by signing a message (if you e.g. include your own IP or even better, the URL the post is sent too) into the signed text, should be sufficient to scary every spam host to death (imagine how long it takes to calculate the signature for each trackback to send).
Or am I wrong here?
My first attempt at dealing with trackback spam was using a domain blacklists, which quickly became unmaintainable as the domain names changed fairly quickly.
A whitelist would definitely by more maintainable, and you don't have to deal with the GPG part. Of course, the real value for me of trackback is in finding new, unexpected sites linking back to my site.
I've been using RBL and SURBL to check the IP address and content of trackback posts and that's been the most effective at blocking the majority of trackback spam on my blog.
On first glance it seems like a good idea, but there are several topics that I either haven't really figured out or think do not apply:
- Trackback spammers normally do not auth at all, so how do you separate good trackbacks from bad trackbacks at the first time? I'd says you can't - so you have exact the same work if person x from blog y trackbacks you for the first time
- Security: You have to put your _private_ key onto your webserver. hopefully out of docroot, but I wouldn't bet that all do this. What if the server gets compromised (a lot more probable than my home box with dynamic ip behind my firewall)
- Amount of work: you have to make a new gpg key, organize it, sign people, etc. Most people won't do this for their more important primary gpg-key
- Administration of group blogs - does only the admin/main blogger whitelist the keys? what if you don't trust him enough to have _your_ private key on _his_ server
OK, one pro I can't see yet: If you do several blogs, you have only one key. But there are not too many people.
To sum it up, I'd say this has too many drawbacks, gpg is just not right for this. But I may be a bit paranoid...
*reposted because of strange things happening*
Fields with bold names are mandatory.
Trackback spam
I haven't updated the war on spam in a while...overall, the blockers have been working very well, but there's been an increase in the Google ad-based spam. I rotated the blocked trackbacks logs...and there were two blocks within the first 5 minutes.
...