schlitt.info - php, photography and private stuff ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :Author: Tobias Schlitt :Date: Wed, 19 Nov 2008 23:29:46 +0100 :Revision: 1 :Copyright: CC by-nc-sa ================= Using Gmails spam ================= :Description: Since I maintain my own server for web, mail and some other services, I do not use my Gmail account much. I originally created the account just by curiosity for their UI and now use it to log into other Google services and occasionally if I need a different account than one of my main ones. What I like about Gmail is, that it seems to have a quite good spam filter. In the past half year about 10 spams got through to my inbox, while more than 900 were filtered into the spam folder (in the past 30 days, if you believe Gmail). Since I maintain my own server for web, mail and some other services, I do not use my `Gmail`__ `account`__ much. I originally created `the account`__ just by curiosity for their UI and now use it to log into other `Google`__ services and occasionally if I need a different account than one of my main ones. What I like about Gmail is, that it seems to have a quite good spam filter. In the past half year about 10 spams got through to my inbox, while more than 900 were filtered into the spam folder (in the past 30 days, if you believe Gmail). .. __: http://gmail.com .. __: mailto:schlitt@gmail.com .. __: mailto:schlitt@gmail.com .. __: http://google.com So, what to do with those filtered spams? Deleting them just at once is a bummer because some cute mail marketers might have wasted hours in hacking web spiders, mailing scripts or Windoze trojans for bot networks. Therefore I decided to make some more use of the nice large collection in training the bayes filter of my Spamassassin with it. ;) If you like to do the same, you just need a .fetchmailrc configured for Gmail and a small shell script that receives the spam from Gmail and makes Spamassassin learn it. The following are my settings and the script, which you could use as a starting point: :: poll imap.gmail.com with proto IMAP user 'schlitt@gmail.com' there with password 'very very secret' options keep ssl sslfingerprint '2E:52:DE:98:7F:07:A3:CB:43:9E:7B:77:51:60:0E:07' This .fetchmailrc (note that I left my gmail address in there intentionally, I want more spam for training! ;) configures the gmail IMAP access through SSL. You need to use IMAP, since POP does not know about folders on the remote host and therefore will fetch mails from your inbox instead of the spam folder. :: #!/bin/bash /usr/bin/fetchmail -a -n -s \ --folder '[Gmail]/Spam' \ -m '/usr/bin/sa-learn -C /etc/spamassassin --no-sync --spam' \ | awk '/Learned tokens from 1 message/ { learned++; } /1 message(s) examined/ { all++; }\ END { print "Learned " learned " from " learned " messages. Thanks Google! ;)"; }' /usr/bin/sa-learn --sync This bash script can be used via CRON to fetch the spam mails from Gmail and inject them into the bayes filter for learning. Note that I use a global bayes database for my whole server and that this database must be writeable for the user who executes the CRON job. *fetchmail* calls the *sa-learn* command instead of a real MDA (thanks to `this Spamassassin wiki page`__ for the hint). Note, that the user executing this script also needs write access to the directory containing the bayes database files. The learning process creates a journal file, which is then synced into the database, there. To avod that a new journal is created for each mail and this journal is synched after each mail the *--no-sync* switch is used. I expect the synchronization re-calculates the probabilities in the database. The final *sa-learn --sync* switch makes the learning complete. .. __: http://wiki.apache.org/spamassassin/RemoteImapFolder Looks like the spam in `my Gmail account`__ got a nice new employment. ;) .. __: mailto:schlitt@gmail.com P.S.: I hope that the Spamassassing bayes does not use the To/CC/... headers for learning. Else all my Gmail ham might be classified as spam soonish, too. ;) .. Local Variables: mode: rst fill-column: 79 End: vim: et syn=rst tw=79 Trackbacks ========== Comments ======== - d naras at Wed, 18 Jun 2008 22:34:56 +0200 holas como estan los saluda el incomparable - Bob at Thu, 24 Sep 2009 01:30:56 +0200 Gmails spam filter is one of the best and has been for years. Then again when you have the volume they have learning must be much easier! Similar to Messagelabs. - Joolee at Mon, 21 Dec 2009 21:02:17 +0100 As GMail changed their name and SSL certificates, you should change the folowing: #!/bin/bash /usr/bin/fetchmail -a -n -s \ --folder '[Gmail]/Spam' \... to #!/bin/bash /usr/bin/fetchmail -a -n -s \ --folder '[Google Mail]/Spam' \... and options keep ssl sslfingerprint '2E:52:DE:98:7F:07:A3:CB:43:9E:7B:77:51:60:0E:07' to options keep ssl sslfingerprint '35:D1:0A:42:F3:FE:61:4E:CD:0C:02:05:D1:CC:D9:52' - Ahmad Al Jayousi at Thu, 25 Nov 2010 11:05:28 +0100 I am looking to increase my website inlinks popularity ANy one can help Philadelphia University Jordan http://www.philadelphia.edu.jo E-mail: aaljayousi@philadelphia.edu.jo - OCD Blogger .com at Fri, 30 Mar 2012 19:18:14 +0200 Beware of this site, that may be linked on popular websites (search engines), Emails and stuff! Seems like they are collecting Orkut user data - ExplodeBlogging.com at Fri, 30 Mar 2012 19:51:32 +0200 the eZ Webdav component we use half-way automated acceptance tests beside the normal unit tests. These test cases ensure compatibility with different clients. Each of them holds captured request and response data, from a manual, successful test run with a certain client. Every test case consists of about 20-200 data sets, while each set - before yesterday - consisted of 5 files - Blog Kangdeni at Thu, 05 Apr 2012 13:23:02 +0200 I am exploring this subject as part of a report I need to do on possible careers I might choose. Thank you for your post it has valuable information on this topic. - Coursework help at Wed, 04 Jul 2012 07:18:20 +0200 Howdy there, are you having difficulties with the hosting? I needed to refresh the page about huge number of times in order to get the page to run - OpenVPN services at Wed, 13 Mar 2013 11:24:35 +0100 I am so much excited after reading your blog. Your blog is very much innovative and much helpful for any industry as well as for person - Desire Essays at Thu, 28 Mar 2013 22:09:39 +0100 I am really very happy to get this post and comment here. - Shell Scripting at Fri, 12 Apr 2013 09:09:24 +0200 I like this wonderful report and the main thing that this topic was finally covered here, at this web! - Philip Roger at Fri, 19 Apr 2013 22:43:55 +0200 Hi I just wanted to drop by for a bit here - Philip at Fri, 19 Apr 2013 22:44:43 +0200 To me it's not about what you find or see, but how you read it! - Philip at Fri, 19 Apr 2013 22:45:22 +0200 Even though I posted this a bunch of times just now, I feel I didn't get the message across at all - Click here at Sun, 21 Apr 2013 19:55:42 +0200 Your post is expressing what i actually wanted to know, thanks alot! - homesecurityalarm at Thu, 25 Apr 2013 09:12:55 +0200 Here are a number of quite simple and straightforward home security tips on how you can protect your home, family and assets. - section125guide at Thu, 25 Apr 2013 09:13:18 +0200 Either option would also avoid the massive costs of compensating people wrongly compelled to start saving and have the additional benefit of helping to win the next general election. Or at least avoid losing it. What’s not to like? - bestbandbinns at Thu, 25 Apr 2013 09:13:55 +0200 Thanks for the post, Matt. Basically confirmed my preconceived notions about cruises. Q: Would you consider going on another cruise if it was more focused on the places you were going to visit and gave a more realistic picture of the destination countries? - bestbusinessloanrates at Thu, 25 Apr 2013 09:14:31 +0200 So what?” you may very well ask. Well, the first few weeks of this year demonstrate that there are risks involved in being out of stock markets, as well as the better-known risks of being in them. - menandwomensunglasses at Thu, 25 Apr 2013 09:14:50 +0200 If I only had another second to set this shot – literally happens in a split second – it could have been a fun one. Blurry or not, you can still get a sense of his style. - http://pinterest.com/louisvuitton05/ at Mon, 13 May 2013 07:51:02 +0200 Designer bags are popping up faster than you can say 'Louis Vuitton', in the hands of your friends, family, people you pass it. - http://europe-vacances.blogspot.fr/ at Mon, 13 May 2013 19:53:41 +0200 I was doing a project and for that I was looking for related information. Some of the points are very useful. Do share some more material if you have. - kishor at Thu, 16 May 2013 11:03:50 +0200 This link is Very Informative.... - http://webhostingaudit.com/m6-net-review/ at Thu, 16 May 2013 11:04:58 +0200 This link is very infomative...