The eZ Components BookLatest events |
Entries tagged as spamRelated tags
advertisement akismet annoyance article blog comments community related dnsbl doc ebuild emerge geek gentoo gmail google heise hosting howto icq linux mail open source patch pcre pear pearweb personal php postfix problem qmail recommendation referrer s9y scaleability script server services_trackback shell smtp spamassassin spamdyke squirrelmail surbl tool trackback update vpopmail websiteThursday, April 3. 2008Fighting "personal spam"?I've been to the Dortmund post office quite often lately, mostly because I'm never at home when the postman wants to deliver my packages. The largest German post service provider "Deutsche Post AG" also owns a bank, the "Postbank". While they did not bother me with any of that stuff earlier, their advertisement for non-postal and postal products starts getting more and more annoying. But let me start at the beginning... While queuing inside the office to get to a counter, you need to stand between shelves that contain other stuff they sell at the post office: Stationery, cellphones, home phones and much more. Since you usually queue between 10 and 30 minutes you get enough time to read all those nice advertisement slogans. Right before you get to a counter, there is a large LCD screen that constantly shows a mixture of recent news and more and more advertisement: Banking stuff, cellphones, postal services and so on. When you finally make it to the counter, the staffer is usually unfriendly and not the fastest one. However, we are used to this for ages now and it's not the point of this article. When you are finally done hand happy to hold the latest DVD from Amazon in your hands, the staffer suddenly gets friendly: "Do you already have an account at Postbank?". A friendly "No thanks" does not work: "A just wanted to make sure you noticed...". "No, thank you!". "But it's about your future! Do you know you can save...". "I am sorry, but I already have a bank account and I do not want to change!". After I had this situation for the 3rd time within a week now, I tend to simply lie to those people: I now tell them I'd still work for my old employer which also was a bank. This seems to work perfectly fine and they even reply with "Oh, sorry for disturbing you" and let you go. However, I wonder if there isn't any legal remedy I have against such spam? It is illegal to send me emails about drugs I'm not interessted in, but to spam me personally about banking services I don't want? Wednesday, March 12. 2008Using Gmails spamSince I maintain my own server for web, mail and some other services, I do not use my Gmail account much. I originally created the account just by curiosity for their UI and now use it to log into other Google services and occasionally if I need a different account than one of my main ones. What I like about Gmail is, that it seems to have a quite good spam filter. In the past half year about 10 spams got through to my inbox, while more than 900 were filtered into the spam folder (in the past 30 days, if you believe Gmail). So, what to do with those filtered spams? Deleting them just at once is a bummer because some cute mail marketers might have wasted hours in hacking web spiders, mailing scripts or Windoze trojans for bot networks. Therefore I decided to make some more use of the nice large collection in training the bayes filter of my Spamassassin with it. ;) If you like to do the same, you just need a .fetchmailrc configured for Gmail and a small shell script that receives the spam from Gmail and makes Spamassassin learn it. The following are my settings and the script, which you could use as a starting point:
This .fetchmailrc (note that I left my gmail address in there intentionally, I want more spam for training! ;) configures the gmail IMAP access through SSL. You need to use IMAP, since POP does not know about folders on the remote host and therefore will fetch mails from your inbox instead of the spam folder.
This bash script can be used via CRON to fetch the spam mails from Gmail and inject them into the bayes filter for learning. Note that I use a global bayes database for my whole server and that this database must be writeable for the user who executes the CRON job. fetchmail calls the sa-learn command instead of a real MDA (thanks to this Spamassassin wiki page for the hint). Note, that the user executing this script also needs write access to the directory containing the bayes database files. The learning process creates a journal file, which is then synced into the database, there. To avod that a new journal is created for each mail and this journal is synched after each mail the --no-sync switch is used. I expect the synchronization re-calculates the probabilities in the database. The final sa-learn --sync switch makes the learning complete. Looks like the spam in my Gmail account got a nice new employment. ;) P.S.: I hope that the Spamassassing bayes does not use the To/CC/... headers for learning. Else all my Gmail ham might be classified as spam soonish, too. ;) Thursday, February 28. 2008Major update for spamdyke on Gentoo howtoI recently migrated my server to a new maschine and a new provider. After supporting Kore today with installing spamdyke on his maschine, too, I seized the chance to update my Spam Filtering with Spamdyke in front of Qmail howto on the wiki. The howto describes how to install the most recent version of spamdyke on a Gentoo system, explains the most important configuration options and gives some practical hints for such setups. You can ask Kore, it only takes about 10 minutes to do so ;) and saves you a huge lot of spam. Comments and addtions are very welcome. Wednesday, October 10. 2007Howto: Spamdyk and Qmail on GentooAfter my server was close to wasting all its CPU time for checking email messages for potential spam using Spamassassin I decided that it was time to investigate. My friend Arne, who helped me a lot with Qmail problems earlier, recommended to install spamdyk, an SMTP spam filter that is placed in front of Qmail and does not require specially patches for the MTA itself. Spamdyk can filter mail by blacklisting, whitelisting, greylisting and using several other options. Thanks to Arne for this great tip! Spamdyk is up and running now on my maschine and my load is now constantly below 0.40, while spam receival seems to be reduced drastically. Since I did not find much information about Spamdyk on Gentoo, I wrote down my experiences as a little howto in the Gentoo wiki. Maybe someone finds it helpfull. Any feedback welcome! Friday, February 16. 2007Comments back onlineMi friend Kristian from eZ Systems noted this morning, that comments on my blog were disabled. That was actually an accedent, because I shutted down comments some days ago to stop a spam wave and forgot to switch them on again. I now re-enabled comments and trackbacks. If you ever want to comment on one of my posts and find comments disableds, don't hesitate to send me a mail about it! Tuesday, April 4. 2006ICQ spam sucksSince 1996 I'm using ICQ as an instant messaging service. While I was satisfied with it all the time, I'm now getting into real trouble with ICQ spam. Blocking unwanted messages from users that are not on your buddy list is not a problem (at least in Gaim, although this tools seems to forget the privacy settings now and then). But lately I receive constant masses of authorization request spam. It's a real annoyance to get about 50-200 of these messages per day and to reject all of them manually. I would be really near to leaving the ICQ network, if there weren't so many of my non-geek friends on ICQ. Does anyone have a solution here? Else I would have to consider writing a plugin for Gaim on my own, to specify UINs which should be constantly blocked, no matter what message they send. Although I did not find anything like that on the web: Is anyone out there who has it already? Or someone who could write it by a tip of his finger? Saturday, February 11. 2006Trakbacks on PEARWeb and PEAR::Services_TrackbackI finally found some time and improved the trackback handling on PEARWeb:
Only trackbacks that come through these filters will be added to PEARWeb (and still need manual approval by package maintainers). Compared to the past we now receive only 10% of the spam, I estimate. Unapproved trackbacks will be deleted from PEARWeb automatically now, after 14 days, so maintainers who don't care about trackbacks can simply ignore them. At least, a lot of maintainers seem to care about trackbacks: By now we have more than 250 valid ones, which enhance the packages documentation with additional release information, use cases, examples and hints. I think this is a valid point for having trackbacks for our packages. Any further ideas how to fight trackback spam? Monday, February 6. 2006Thoughts on trackback spamIt's been a long while since I worked on my PEAR package Services_Trackback, mainly because I was much too busy with work and university. Nevertheless I made up my mind about how to solve the problem of the so-called trackback spam. In email environments people search for a solution to spam since email was invented and by now, no satisfactory solution was found (AFAIK, please correct me, if I'm wrong, I would be thrilled). Approaches here include complicated techniques like heuristic algorithms and easy ones like grey-listing, as well as sender identification (which is useless by now, since no unique standard exists and almost no program supports it). The approach of heuristics might be also feasible for trackbacks, but will also give no satisfactory results there. Sadly spammers are much more creative than computer programs. The grey-listing approach is completely unfeasible, since trackbacks are used by web applications which mostly don't have the ability to retry a request after a certain amount of time. Whereas the idea of identifying the sender looks promising to me. Taking for granted, that the idea should work, there are 2 main questions to answer:
For question #1 there is a simple answer (IMHO): PGP/GPG (further on referred to as GPG, for simplicity). The infrastructure of signing data using a GPG is already in place and has quite a lot of benefits regarding authentication of a sender and trust relationships. The major benefits for using this technique to identify the sender of a trackback are:
I think GPG signing would be a generally good choice for identifying the sender of a trackback sufficiently. Question #2 is a bit more difficult to answer. While the trackback standard is crappy and not really good thought out, it contains all necessary (even if not always all desirable) information for its purpose. More important: It is extremely wide spread in the weblog scene and already taken over by a lot of other web sites. Changing this standard would result in almost the same chaos than the approach of changing the email standard for sender identification. But since the trackback standard set's on simple HTTP POST submission of the data, the standard can simply be enhanced by a "signature" field: A receiving application can simply use the signature field (if it is already enable to take care of signatures) or will automatically ignore it, since it does not even know about it. The trackback standard also allows the free-form definition of an error message, as the response, so sites which are not yet enabled of signing their trackbacks, can easily be informed of the lag. For a transition period, not-signed trackbacks can still simply require moderation, while signed trackbacks can automatically be accepted, or (more capable for large sites) can be ignored with an error, that only signed trackbacks are accepted. I don't think I thought about every single case and I'm sure there will occur pitfalls with this idea, but in general I'm sure it can be a way to go and to successfully fight trackback spam, because:
Before I start wildly hacking and implementing this easy solution for my trackback class and the blog software I'm using, I'd like to hear some opinions from you out there. What do you think about the approach? Is there already something similar, which I maybe missed? What problems do you see? Curious for feedback! :)
Posted by Tobias Schlitt
in Community related, Geek, PEAR
at
21:19
| Comments (4)
| Trackback (1)
Defined tags for this entry: akismet, community related, geek, pear, services_trackback, spam, trackback
Thursday, August 4. 2005Open Source Spam?Chregu and me recently seemed to have received or first "open source spam" (never knew, such existed...) at Planet-PHP. The following email landed in our mailbox (names obfuscated):
Funnily it had an example of the usage of this "framework" attached... As a PNG! ;) ![]() Seems to be really useful, isn't it? ;) Friday, June 24. 2005Services_Trackback - Thoughts on trackback spamA few weeks ago I announced the release of Services_Trackback 0.5.0, which has a new module system for integrating spam protections into your trackback mechanisms. While the most easy filter (the bad word list) worked quite well for the first time frame, but as usual it did not take long for the spammers to work around that with using entitie encoding. Of course to get around that from the anti spam point of view is very simple, too, with simply reconverting that stuff before running the bad word check. But that's not really the sense, because the spam fraction will not need long to come around this, too. So, basically what I'm currently thinking about is, how to build a (to some degree) reliable spam protection. The great archetype for such a system could of course be spamassassin. Where the question is of either re-implementing a similar system (rule based, regex based,...) or simply try to interface with spamassassin itself. I talked to several people here on Linuxtag to get their opinion on such stuff and the common sense was to keep the module stuff as is and try to write a new module interfacing with Spamassassin. That's what I will try to do in the next time. Beside that I shared some general thoughts on spam protection and tried to get some input on what methods may be sensible. Services_Trackback currently supports 4 spam modules, which are:
While the first 2 are pretty simple, but somewhat effective, the remaining are more resource greedy and complex. The DNSBL of course is effective when spammer come through a dial-up connection, since most of those IP ranges are blocked through DNSBL (no one would really run a productive webserver through a dialup connection and trackbacks usually come from productive websites). On the other hand, this method is quite ineffective when someone spams through static IPed server, since DNSBLs only list servers with open SMTP relays and that's mostly likely not the case on such servers. The 3rd method infact is the most effective one (SURBL) since it extracts the URLs from a trackback and check the domain name of those against a DNS server. But the effectiveness is payed with even more resource consumption, since the URLs have to be extracted and each have to be checked through a DNS lookup. Please read the extended entry to get an impression on my thoughts and comment on them. I would also be lucky to receive some more ideas on that topic! Continue reading "Services_Trackback - Thoughts on trackback spam" Monday, May 30. 2005Fighting trackback spam on PEARWebTrackback spammers only needed a couple of weeks to discover PEARWeb's new trackback feature for their purposes. Of course I had to do something against that, which sadly lasted very long because I've been much too busy. Last week I finally managed to release Services_Trackback 0.5.0 and migrated the PEARWeb code to the new version. Until now, trackback spam seems to be stopped. Ususally there was a huge number new spam trackbacks on PEARWeb per day, since last Tuesday there was not even 1. Since the new spam check features in Services_Trackback have some of the most common methods against trackback spam bundled, the process PEARWeb performs to check for spam is very simple:
The first 3 lines of code create 3 spam check modules and add them to the trackback recently received. Services_Trackback would support to add checks each with a different priority to select the order of their execution, I don't do that here, so they get executed in the same order they were added. All checks are used with their default configuration. The checkSpam() method checks each of the added spam checks sequentially, stops if one of the reports spam and returns true (the trackback is considered spam). If none of the modules indicate spam, checkSpam() returns false. What this means exactly is, that first the trackback is scanned using the integrated "bad word list", which should filter almost 80% of the trackback spam. Next the host sending the trackback is checked against the Spamcop DNS blacklist, which should clean up another 5-10% of the spam trackbacks. Last a SURBL check is performed, checking the links contained in the trackback. This should (hopefully) catch the rest of trackback spam from comming through. Since the spam check stops if one module indicates spam, the waste of resources is pretty low (only if a trackback passes the Wordlist filter the more resource expensive DNSBL is performed and only if this is passed, the even more expensive SURBL checks are performed). I'm currently very confident, that those checks will keep away most trackback spam from PEARWeb. Nevertheless, I'm interessted in implementing (maybe cheaper) spam protection methods. So, anyone out there any feedback/ideas on Services_Trackback, it's spam protection or similar? |





