The eZ Components BookLatest events |
Entries tagged as services_trackbackFriday, March 3. 2006Services_Trackback and Akismet.com spam checksA week ago I released my PEAR package Services_Trackback with a new feature added: Support for checking trackbacks against the Akismet.com web service. Akismet is provided by wordpress.com and is the first attempt to centralize efforts against trackback and comment spam. Services_Trackback now supports checking trackbacks against Akismet and so does PEARWeb do, too. Beside that, PEAR developers have the chance to report trackbacks they recieved as spam to Akismet (in addition to deleting them from the database) and with that help to improve the Akismet service. So, let's see how this solution turns out... Services_Trackback now reached beta stadium. I think the class is more than feature complete now (maybe some more spam checking techniques will follow) and the API turned out to be quite ok (considering that I want to keep it PHP 4 compatible). Saturday, February 11. 2006Trackbacks for eZ PublishMy friend Lukasz has used my PEAR::Services_Trackback package to create an eZ publish trackback extension. This enables you, to integrate trackback capabilities very easily into your eZ publish based weblog or your whole eZ publish based website. Read more here. Bugfix releases for Net_FTP and Services TrackbackMy recent releases fix a lot of tiny bugs in my PEAR packages Services_Trackback and Net_FTP. Trakbacks on PEARWeb and PEAR::Services_TrackbackI finally found some time and improved the trackback handling on PEARWeb:
Only trackbacks that come through these filters will be added to PEARWeb (and still need manual approval by package maintainers). Compared to the past we now receive only 10% of the spam, I estimate. Unapproved trackbacks will be deleted from PEARWeb automatically now, after 14 days, so maintainers who don't care about trackbacks can simply ignore them. At least, a lot of maintainers seem to care about trackbacks: By now we have more than 250 valid ones, which enhance the packages documentation with additional release information, use cases, examples and hints. I think this is a valid point for having trackbacks for our packages. Any further ideas how to fight trackback spam? Monday, February 6. 2006Thoughts on trackback spamIt's been a long while since I worked on my PEAR package Services_Trackback, mainly because I was much too busy with work and university. Nevertheless I made up my mind about how to solve the problem of the so-called trackback spam. In email environments people search for a solution to spam since email was invented and by now, no satisfactory solution was found (AFAIK, please correct me, if I'm wrong, I would be thrilled). Approaches here include complicated techniques like heuristic algorithms and easy ones like grey-listing, as well as sender identification (which is useless by now, since no unique standard exists and almost no program supports it). The approach of heuristics might be also feasible for trackbacks, but will also give no satisfactory results there. Sadly spammers are much more creative than computer programs. The grey-listing approach is completely unfeasible, since trackbacks are used by web applications which mostly don't have the ability to retry a request after a certain amount of time. Whereas the idea of identifying the sender looks promising to me. Taking for granted, that the idea should work, there are 2 main questions to answer:
For question #1 there is a simple answer (IMHO): PGP/GPG (further on referred to as GPG, for simplicity). The infrastructure of signing data using a GPG is already in place and has quite a lot of benefits regarding authentication of a sender and trust relationships. The major benefits for using this technique to identify the sender of a trackback are:
I think GPG signing would be a generally good choice for identifying the sender of a trackback sufficiently. Question #2 is a bit more difficult to answer. While the trackback standard is crappy and not really good thought out, it contains all necessary (even if not always all desirable) information for its purpose. More important: It is extremely wide spread in the weblog scene and already taken over by a lot of other web sites. Changing this standard would result in almost the same chaos than the approach of changing the email standard for sender identification. But since the trackback standard set's on simple HTTP POST submission of the data, the standard can simply be enhanced by a "signature" field: A receiving application can simply use the signature field (if it is already enable to take care of signatures) or will automatically ignore it, since it does not even know about it. The trackback standard also allows the free-form definition of an error message, as the response, so sites which are not yet enabled of signing their trackbacks, can easily be informed of the lag. For a transition period, not-signed trackbacks can still simply require moderation, while signed trackbacks can automatically be accepted, or (more capable for large sites) can be ignored with an error, that only signed trackbacks are accepted. I don't think I thought about every single case and I'm sure there will occur pitfalls with this idea, but in general I'm sure it can be a way to go and to successfully fight trackback spam, because:
Before I start wildly hacking and implementing this easy solution for my trackback class and the blog software I'm using, I'd like to hear some opinions from you out there. What do you think about the approach? Is there already something similar, which I maybe missed? What problems do you see? Curious for feedback! :)
Posted by Tobias Schlitt
in Community related, Geek, PEAR
at
21:19
| Comments (4)
| Trackback (1)
Defined tags for this entry: akismet, community related, geek, pear, services_trackback, spam, trackback
Friday, June 24. 2005Services_Trackback - Thoughts on trackback spamA few weeks ago I announced the release of Services_Trackback 0.5.0, which has a new module system for integrating spam protections into your trackback mechanisms. While the most easy filter (the bad word list) worked quite well for the first time frame, but as usual it did not take long for the spammers to work around that with using entitie encoding. Of course to get around that from the anti spam point of view is very simple, too, with simply reconverting that stuff before running the bad word check. But that's not really the sense, because the spam fraction will not need long to come around this, too. So, basically what I'm currently thinking about is, how to build a (to some degree) reliable spam protection. The great archetype for such a system could of course be spamassassin. Where the question is of either re-implementing a similar system (rule based, regex based,...) or simply try to interface with spamassassin itself. I talked to several people here on Linuxtag to get their opinion on such stuff and the common sense was to keep the module stuff as is and try to write a new module interfacing with Spamassassin. That's what I will try to do in the next time. Beside that I shared some general thoughts on spam protection and tried to get some input on what methods may be sensible. Services_Trackback currently supports 4 spam modules, which are:
While the first 2 are pretty simple, but somewhat effective, the remaining are more resource greedy and complex. The DNSBL of course is effective when spammer come through a dial-up connection, since most of those IP ranges are blocked through DNSBL (no one would really run a productive webserver through a dialup connection and trackbacks usually come from productive websites). On the other hand, this method is quite ineffective when someone spams through static IPed server, since DNSBLs only list servers with open SMTP relays and that's mostly likely not the case on such servers. The 3rd method infact is the most effective one (SURBL) since it extracts the URLs from a trackback and check the domain name of those against a DNS server. But the effectiveness is payed with even more resource consumption, since the URLs have to be extracted and each have to be checked through a DNS lookup. Please read the extended entry to get an impression on my thoughts and comment on them. I would also be lucky to receive some more ideas on that topic! Continue reading "Services_Trackback - Thoughts on trackback spam" Monday, May 30. 2005Fighting trackback spam on PEARWebTrackback spammers only needed a couple of weeks to discover PEARWeb's new trackback feature for their purposes. Of course I had to do something against that, which sadly lasted very long because I've been much too busy. Last week I finally managed to release Services_Trackback 0.5.0 and migrated the PEARWeb code to the new version. Until now, trackback spam seems to be stopped. Ususally there was a huge number new spam trackbacks on PEARWeb per day, since last Tuesday there was not even 1. Since the new spam check features in Services_Trackback have some of the most common methods against trackback spam bundled, the process PEARWeb performs to check for spam is very simple:
The first 3 lines of code create 3 spam check modules and add them to the trackback recently received. Services_Trackback would support to add checks each with a different priority to select the order of their execution, I don't do that here, so they get executed in the same order they were added. All checks are used with their default configuration. The checkSpam() method checks each of the added spam checks sequentially, stops if one of the reports spam and returns true (the trackback is considered spam). If none of the modules indicate spam, checkSpam() returns false. What this means exactly is, that first the trackback is scanned using the integrated "bad word list", which should filter almost 80% of the trackback spam. Next the host sending the trackback is checked against the Spamcop DNS blacklist, which should clean up another 5-10% of the spam trackbacks. Last a SURBL check is performed, checking the links contained in the trackback. This should (hopefully) catch the rest of trackback spam from comming through. Since the spam check stops if one module indicates spam, the waste of resources is pretty low (only if a trackback passes the Wordlist filter the more resource expensive DNSBL is performed and only if this is passed, the even more expensive SURBL checks are performed). I'm currently very confident, that those checks will keep away most trackback spam from PEARWeb. Nevertheless, I'm interessted in implementing (maybe cheaper) spam protection methods. So, anyone out there any feedback/ideas on Services_Trackback, it's spam protection or similar? Tuesday, May 24. 2005Finally: Services_Trackback 0.5.0I finally managed to upload the next release of Services_Trackback (which is a generic class for sending and receiving trackbacks). The most important new feature for this version is integrated spam checking. Services_Trackback now implements a flexible API to add spam detection modules to a trackback using
Spam checks in Services_Trackback are simple classes which implement the API of Services_Trackback_SpamCheck (abstract class). This allows you to simply define custom spam checkings and use them in combination with predefined ones. The following example creates 3 (predefined) spam checks and runs them in the order Wordlist, SURBL, DNSBL (priority). If one spam check determies spam, the process stops and $trackback->checkSpam() returns true, else false is returned:
The following (built-in) spam checks are available so far:
To install and try Services_Trackback simply do a
The actual PEAR package is packaged with a package.xml and a package2.xml, which allows you to utilize the amazing new features of PEAR 1.4. If you installed the package using 1.4 you can install packages needed for autodiscovery features (automatically discover the trackback URI of a blog entry) by typing
If you want to use DNSBL/SURBL spam check modules, you will have to install the neccessary features with
I beg everyone out there for feedback regarding the package (especially the developers of well-known weblog applications like Serendipity), it's facilities and it's API. Please comment on this blog entry! My wish would be to have Services_Trackback adopted by those applications to generate a single point of development for the trackback feature (which would be a benefit for all user, of course). For a complete list of features and a list of interessting links please refer to the extended version of this entry. Continue reading "Finally: Services_Trackback 0.5.0" Friday, March 11. 2005Services_Trackback 0.4.0 released.Just a quick note, although I'm currently on vacation. My proposal for Services_Trackback has been accepted by the PEAR community yesterday. Therefore, I recently created the package on PEARWeb and uploaded the first official release. Wednesday, March 2. 2005Services_Trackback - let your site get tracked back.Today I called for votes in the PEAR proposal system for my new package Services_Trackback. If you are reading this and don't know, what a trackback is, please read the MT introduction into trackbacks or the technical specs. The idea to Services_Trackback was born, when I wanted to implement trackbacks for PEARWeb (which is quite an uncommon idea, since trackbacks normally occur only in weblogs). The idea is, that if someone posts a blog entry about a PEAR package, he will usually link to the package(s) he's writing about. This link should (usually) result in a trackback send try to the given URL. Having those trackbacks registered on PEARWeb allows a comfortable way of having some kind of "link list" for every package, where interessting blog entries are linked. In the past 4 weeks this has proven in some way, since we currently have more than 40 approved trackbacks on PEARWeb, regarding a range of different packages. Interessting articles like "How to set up your own PEAR 1.4 channel" or "Gentoo PHP Development" have been saved and show actual documentation to the specific packages. So, back to Services_Trackback. Since tha trackback API is pretty simple in general, one would normal say, that it's not worth a package. But most trackback implementations in today's weblog systems have 3 great problems:
Services_Trackback tries to solve these problems and to centralize implementations of trackbacks, offering a flexible an clear API, a central place to report bugs and improve functionality and a huge set of features. The actual version of Services_Trackback supports:
My current ideas for further feature extension are:
This list is of course pretty much extendable by feature requests of users. Services_Trackback is currently proposed to become a PEAR package and will hopefully be accepted by March 10th. More information, links for documentation and download can be found publically available in PEPr. If you're a registered PEAR developer, please vote for the package here.
(Page 1 of 1, totaling 10 entries)
|
eZ ComponentsCalendar
| |||||||||||||||||||||||||||||||||||||





