Thoughts on trackback spam - Blog - Open Source - schlitt.info

schlitt.info - php, photography and private stuff

Thoughts on trackback spam

It's been a long while since I worked on my PEAR package Services_Trackback, mainly because I was much too busy with work and university. Nevertheless I made up my mind about how to solve the problem of the so-called trackback spam. In email environments people search for a solution to spam since email was invented and by now, no satisfactory solution was found (AFAIK, please correct me, if I'm wrong, I would be thrilled). Approaches here include complicated techniques like heuristic algorithms and easy ones like grey-listing, as well as sender identification (which is useless by now, since no unique standard exists and almost no program supports it).

The approach of heuristics might be also feasible for trackbacks, but will also give no satisfactory results there. Sadly spammers are much more creative than computer programs. The grey-listing approach is completely unfeasible, since trackbacks are used by web applications which mostly don't have the ability to retry a request after a certain amount of time. Whereas the idea of identifying the sender looks promising to me. Taking for granted, that the idea should work, there are 2 main questions to answer:

  1. How can a sender of a trackback be identified?

  2. If and how must the trackback standard be changed to support the identification?

For question #1 there is a simple answer (IMHO): PGP/GPG (further on referred to as GPG, for simplicity). The infrastructure of signing data using a GPG is already in place and has quite a lot of benefits regarding authentication of a sender and trust relationships. The major benefits for using this technique to identify the sender of a trackback are:

  • GPG is usable on any platform and any kind of web language is able to interact with it (either through an extension or simple by calling it on the shell).

  • GPG provides the signing of data in plain text.

  • The GPG infrastructure is widely spread under technically interested people.

  • GPG will allow trackback-enabled applications to build structures like "trust senders which have a key", "trust senders which have a key signed by the recipient" or "trust senders which have a key signed by XYZ", which gives a high amount of flexibility.

I think GPG signing would be a generally good choice for identifying the sender of a trackback sufficiently.

Question #2 is a bit more difficult to answer. While the trackback standard is crappy and not really good thought out, it contains all necessary (even if not always all desirable) information for its purpose. More important: It is extremely wide spread in the weblog scene and already taken over by a lot of other web sites. Changing this standard would result in almost the same chaos than the approach of changing the email standard for sender identification.

But since the trackback standard set's on simple HTTP POST submission of the data, the standard can simply be enhanced by a "signature" field: A receiving application can simply use the signature field (if it is already enable to take care of signatures) or will automatically ignore it, since it does not even know about it. The trackback standard also allows the free-form definition of an error message, as the response, so sites which are not yet enabled of signing their trackbacks, can easily be informed of the lag. For a transition period, not-signed trackbacks can still simply require moderation, while signed trackbacks can automatically be accepted, or (more capable for large sites) can be ignored with an error, that only signed trackbacks are accepted.

I don't think I thought about every single case and I'm sure there will occur pitfalls with this idea, but in general I'm sure it can be a way to go and to successfully fight trackback spam, because:

  • No spammer will want to provide it's real identity to the spammed person.

  • Even if a spammer is that stupid, using the trusting relations, spamming can be made impossible.

Before I start wildly hacking and implementing this easy solution for my trackback class and the blog software I'm using, I'd like to hear some opinions from you out there. What do you think about the approach? Is there already something similar, which I maybe missed? What problems do you see?

Curious for feedback! :)

If you liked this blog post or learned something, please consider using flattr to contribute back: .

Trackbacks

Comments

Add new comment

Fields with bold names are mandatory.