Wednesday, March 12. 2008
Since I maintain my own server for web, mail and some other services, I do not use my Gmail account much. I originally created the account just by curiosity for their UI and now use it to log into other Google services and occasionally if I need a different account than one of my main ones. What I like about Gmail is, that it seems to have a quite good spam filter. In the past half year about 10 spams got through to my inbox, while more than 900 were filtered into the spam folder (in the past 30 days, if you believe Gmail).
So, what to do with those filtered spams? Deleting them just at once is a bummer because some cute mail marketers might have wasted hours in hacking web spiders, mailing scripts or Windoze trojans for bot networks. Therefore I decided to make some more use of the nice large collection in training the bayes filter of my Spamassassin with it. ;)
If you like to do the same, you just need a .fetchmailrc configured for Gmail and a small shell script that receives the spam from Gmail and makes Spamassassin learn it. The following are my settings and the script, which you could use as a starting point:
poll imap.gmail.com with
proto IMAP
user 'schlitt@gmail.com' there with password 'very very secret'
options keep ssl sslfingerprint '2E:52:DE:98:7F:07:A3:CB:43:9E:7B:77:51:60:0E:07'
This .fetchmailrc (note that I left my gmail address in there intentionally, I want more spam for training! ;) configures the gmail IMAP access through SSL. You need to use IMAP, since POP does not know about folders on the remote host and therefore will fetch mails from your inbox instead of the spam folder.
#!/bin/bash
/usr/bin/fetchmail -a -n -s \
--folder '[Gmail]/Spam' \
-m '/usr/bin/sa-learn -C /etc/spamassassin --no-sync --spam' \
| awk '/Learned tokens from 1 message/ { learned++; } /1 message(s) examined/ { all++; }\
END { print "Learned " learned " from " learned " messages. Thanks Google! ;)"; }'
/usr/bin/sa-learn --sync
This bash script can be used via CRON to fetch the spam mails from Gmail and inject them into the bayes filter for learning. Note that I use a global bayes database for my whole server and that this database must be writeable for the user who executes the CRON job. fetchmail calls the sa-learn command instead of a real MDA (thanks to this Spamassassin wiki page for the hint). Note, that the user executing this script also needs write access to the directory containing the bayes database files. The learning process creates a journal file, which is then synced into the database, there. To avod that a new journal is created for each mail and this journal is synched after each mail the --no-sync switch is used. I expect the synchronization re-calculates the probabilities in the database. The final sa-learn --sync switch makes the learning complete.
Looks like the spam in my Gmail account got a nice new employment. ;)
P.S.: I hope that the Spamassassing bayes does not use the To/CC/... headers for learning. Else all my Gmail ham might be classified as spam soonish, too. ;)
Thursday, February 28. 2008
I recently migrated my server to a new maschine and a new provider. After supporting Kore today with installing spamdyke on his maschine, too, I seized the chance to update my Spam Filtering with Spamdyke in front of Qmail howto on the wiki. The howto describes how to install the most recent version of spamdyke on a Gentoo system, explains the most important configuration options and gives some practical hints for such setups. You can ask Kore, it only takes about 10 minutes to do so ;) and saves you a huge lot of spam. Comments and addtions are very welcome.
Friday, October 26. 2007
The yearly International PHP Conference in Frankfurt (or like I usually say: the family meeting) is approaching rapidly and I'd like to invite you to join me in my Hands on eZ Components full day workshop. The session will take place on the first workshop day, which is Sunday the 4th of November, and will provide 6 hours of bundled eZ Components knowledge to you.

At the beginning of this workshop I will give you a general overview on eZ Components, show you the most important concepts and illustrate our architecture and design descisions. After that, we will start digging into code and you will see, how different components work in practice. Using a practical example applications to see working code I will explain to you, you are also invited to make me change it and possible exchange or add a feature and show you a different component. Some of the most interessting components - like Mail, Template and Graph - will be shown in detail and give you a good impression what eZ Components can do for you and how you effectively make use of them. Beside that, I will try to give you some insider tipps and tricks for your daily development and will possibly tab some OO design concepts and patters for explaination.
In addition to these learning aspects of my workshop, it should also give you the possibility to provide us with feedback on what you are missing in eZ Components, what you dislike and what you like about the library. Get into discussion with me and potentially other eZ Components development team members (like Derick and Kore), which will also be at IPC. So, seize the chance and tell us, what you think about our work!
And if you don't have a ticket for IPC, yet, take your heels and register now!
Wednesday, October 10. 2007
After my server was close to wasting all its CPU time for checking email messages for potential spam using Spamassassin I decided that it was time to investigate. My friend Arne, who helped me a lot with Qmail problems earlier, recommended to install spamdyk, an SMTP spam filter that is placed in front of Qmail and does not require specially patches for the MTA itself. Spamdyk can filter mail by blacklisting, whitelisting, greylisting and using several other options.
Thanks to Arne for this great tip! Spamdyk is up and running now on my maschine and my load is now constantly below 0.40, while spam receival seems to be reduced drastically. Since I did not find much information about Spamdyk on Gentoo, I wrote down my experiences as a little howto in the Gentoo wiki. Maybe someone finds it helpfull. Any feedback welcome!
Monday, May 7. 2007
After the usual development cycle of eZ Components we just released version 2007.1 beta 1. This release (which will go stable in a few weeks) includes lots of new features and many bug fixes.
The major feature highlights for this release are
- radar charts and PDO data sets for the Graph component
- a dialog system and new argument handling for ConsoleTools
- support for MS SQL Server in the Database component
- SSL/TLS support for IMAP, POP3 and SMTP connections in the Mail component
- named parameter support and dynamic location support for Template
- "delayed initialization" for all suitable components
and a lot more. Beside that eZ Components got 2 new components:
- Authentication, which deals with authenticating users against backends such as database, LDAP or typekey (much more backends to come in the next releases)
- Workflow, which provides the core functionality of an activity-based workflow system including the definition and execution of work flow specifications.
The latter one was developed in scope of the diploma thesis of Sebastian Bergmann and gives you an ingenious tool to enable workflows in your application. As usual, the upcoming weeks will be used to complete the documentation with the new features, while examples and unit tests are already available by now. You can download and try out 2007.1 beta 1 by downloading the full package from our website or installing the components you like through our PEAR channel by doing $ pear upgrade ezc/eZComponents.
Any kind of feedback is welcome here or through our development mailinglist. Thanks for testing!
Monday, September 18. 2006
While working with the eZ Mail component, for sending some emails comfortably, I noticed some issue with my Qmail installation. With some email accounts that received the emails, the headers were broken and had double line break characters, so the complete email was broken. That (weirdly) happened only with some servers (e.g. Gmail), while my own server handeled the emails gracefully.
The problem here was not our Mail component, which (correctly taken over from the PHP manual) used "\r\n", but my Qmail installation. The sendmail-wrapper delivered with Qmail seems to have an issue here and requiresyou to use just "\n". However, our Mail component allows you to change the characters used for line breaks and you can simply do:
<?php
ezcMailTools::setLineBreak( "\n" );
?>
on the components tool class before building your mail and everything will work fine.
Wie also added a note to our manual, so that nobody else will fall into this pitt.
Tuesday, May 9. 2006
Thanks to Jakob I found a patch (look at the very end of the page) which enables Beagle to index Thunderbird emails. I created a Gentoo ebuild for me personally to support this. I just made a TAR ball of it, so you can use it, too. To make it running, do the following steps:
- Create a portage overlay, if you don't have one, yet (mostly /usr/local/portage).
- Add the overlay to your make.conf (PORTDIR_OVERLAY=/usr/local/portage).
- Download this TAR ball and extract it into your overlay.
- emerge app-misc/beagle-0.2.6
Beagle now automatically indexes your TB mails, if you let it index your home dir. If you're using IMAP, you need to enable the download of messages for offline use. Have fun!
Monday, September 20. 2004
Those Gmail invitations really become a kind of spam. If anyone needs any, feel free to mail me on schlitt at gmail dot com to ask for one... There are still 5 of 6 left.
Tuesday, August 31. 2004
Yeah, true, to begin with, I'm back from vacation and holidays. Although I've another month of spare time, I returned to Frankfurt yesterday. Surely I will take some more time at my parents home during the upcoming month for some more relaxing and meeting with all of my old friends. Nower days I have to take care on some issues here in Frankfurt, like moving to Darmstadt and maybe some freelancer job for my study time.
Anyway, since I'm back for a few days, I'll now find time to take a deeper look at my projects, the mailinglists (*shiver*, there are still about 1300 mails left) and other issues left over. A priority project will be to get my website reworked, finally.
So long, I will still not be as active as e.g. in June, but much more active as in the last 4 weeks. :)
Wednesday, June 16. 2004
Finally I got a GMail account for testing. :) Thanks to David Costa who invited me! You can from now on reach me at [schlitt@gmail.com]. Please send me some rant or so, if you got time, so I can test their advertising stuff! :)
Gmail itself is relativly cool. Beside the 1 GB of harddisk they promise you (calculated because nobody will ever use it, they think, IMHO) it has a nice GUI to work with. Pretty much JavaScript/DHTML, but providing usefull stuff (like blending infos in and out). A neat little feature is marking mails with a star. Those mails are not only displayed in their specific folder, but also in a special star folder. That's cool for important mails, to have all in one place.
For those who still have no GMail, you can see a screenshot here:
Well, I think I will do some testing on GMail, when I find time and hope, peolpe help me by sending some rant to it. Please do not send real mail there, since I might not look at it for days, if time is rare.
Anywhere I read about a script to backup data to GMail. That'd be cool. :) Else it will become my normal spam adress for registering with bogus data... ;)
Monday, April 19. 2004
True, it's done.
I completly switched from local mail storage using POP3 deliverage to remote storage using IMAP4. I w |