schlitt.info - php, photography and private stuff ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :Author: Tobias Schlitt :Date: Wed, 19 Nov 2008 23:29:46 +0100 :Revision: 1 :Copyright: CC by-nc-sa ================= Using Gmails spam ================= :Description: Since I maintain my own server for web, mail and some other services, I do not use my Gmail account much. I originally created the account just by curiosity for their UI and now use it to log into other Google services and occasionally if I need a different account than one of my main ones. What I like about Gmail is, that it seems to have a quite good spam filter. In the past half year about 10 spams got through to my inbox, while more than 900 were filtered into the spam folder (in the past 30 days, if you believe Gmail). Since I maintain my own server for web, mail and some other services, I do not use my `Gmail`__ `account`__ much. I originally created `the account`__ just by curiosity for their UI and now use it to log into other `Google`__ services and occasionally if I need a different account than one of my main ones. What I like about Gmail is, that it seems to have a quite good spam filter. In the past half year about 10 spams got through to my inbox, while more than 900 were filtered into the spam folder (in the past 30 days, if you believe Gmail). .. __: http://gmail.com .. __: mailto:schlitt@gmail.com .. __: mailto:schlitt@gmail.com .. __: http://google.com So, what to do with those filtered spams? Deleting them just at once is a bummer because some cute mail marketers might have wasted hours in hacking web spiders, mailing scripts or Windoze trojans for bot networks. Therefore I decided to make some more use of the nice large collection in training the bayes filter of my Spamassassin with it. ;) If you like to do the same, you just need a .fetchmailrc configured for Gmail and a small shell script that receives the spam from Gmail and makes Spamassassin learn it. The following are my settings and the script, which you could use as a starting point: :: poll imap.gmail.com with proto IMAP user 'schlitt@gmail.com' there with password 'very very secret' options keep ssl sslfingerprint '2E:52:DE:98:7F:07:A3:CB:43:9E:7B:77:51:60:0E:07' This .fetchmailrc (note that I left my gmail address in there intentionally, I want more spam for training! ;) configures the gmail IMAP access through SSL. You need to use IMAP, since POP does not know about folders on the remote host and therefore will fetch mails from your inbox instead of the spam folder. :: #!/bin/bash /usr/bin/fetchmail -a -n -s \ --folder '[Gmail]/Spam' \ -m '/usr/bin/sa-learn -C /etc/spamassassin --no-sync --spam' \ | awk '/Learned tokens from 1 message/ { learned++; } /1 message(s) examined/ { all++; }\ END { print "Learned " learned " from " learned " messages. Thanks Google! ;)"; }' /usr/bin/sa-learn --sync This bash script can be used via CRON to fetch the spam mails from Gmail and inject them into the bayes filter for learning. Note that I use a global bayes database for my whole server and that this database must be writeable for the user who executes the CRON job. *fetchmail* calls the *sa-learn* command instead of a real MDA (thanks to `this Spamassassin wiki page`__ for the hint). Note, that the user executing this script also needs write access to the directory containing the bayes database files. The learning process creates a journal file, which is then synced into the database, there. To avod that a new journal is created for each mail and this journal is synched after each mail the *--no-sync* switch is used. I expect the synchronization re-calculates the probabilities in the database. The final *sa-learn --sync* switch makes the learning complete. .. __: http://wiki.apache.org/spamassassin/RemoteImapFolder Looks like the spam in `my Gmail account`__ got a nice new employment. ;) .. __: mailto:schlitt@gmail.com P.S.: I hope that the Spamassassing bayes does not use the To/CC/... headers for learning. Else all my Gmail ham might be classified as spam soonish, too. ;) .. Local Variables: mode: rst fill-column: 79 End: vim: et syn=rst tw=79 Trackbacks ========== Comments ======== - d naras at Wed, 18 Jun 2008 22:34:56 +0200 holas como estan los saluda el incomparable - Bob at Thu, 24 Sep 2009 01:30:56 +0200 Gmails spam filter is one of the best and has been for years. Then again when you have the volume they have learning must be much easier! Similar to Messagelabs. - Joolee at Mon, 21 Dec 2009 21:02:17 +0100 As GMail changed their name and SSL certificates, you should change the folowing: #!/bin/bash /usr/bin/fetchmail -a -n -s \ --folder '[Gmail]/Spam' \... to #!/bin/bash /usr/bin/fetchmail -a -n -s \ --folder '[Google Mail]/Spam' \... and options keep ssl sslfingerprint '2E:52:DE:98:7F:07:A3:CB:43:9E:7B:77:51:60:0E:07' to options keep ssl sslfingerprint '35:D1:0A:42:F3:FE:61:4E:CD:0C:02:05:D1:CC:D9:52' - Ahmad Al Jayousi at Thu, 25 Nov 2010 11:05:28 +0100 I am looking to increase my website inlinks popularity ANy one can help Philadelphia University Jordan http://www.philadelphia.edu.jo E-mail: aaljayousi@philadelphia.edu.jo - OCD Blogger .com at Fri, 30 Mar 2012 19:18:14 +0200 Beware of this site, that may be linked on popular websites (search engines), Emails and stuff! Seems like they are collecting Orkut user data - ExplodeBlogging.com at Fri, 30 Mar 2012 19:51:32 +0200 the eZ Webdav component we use half-way automated acceptance tests beside the normal unit tests. These test cases ensure compatibility with different clients. Each of them holds captured request and response data, from a manual, successful test run with a certain client. Every test case consists of about 20-200 data sets, while each set - before yesterday - consisted of 5 files - Blog Kangdeni at Thu, 05 Apr 2012 13:23:02 +0200 I am exploring this subject as part of a report I need to do on possible careers I might choose. Thank you for your post it has valuable information on this topic.