Wednesday, March 12. 2008
Since I maintain my own server for web, mail and some other services, I do not use my Gmail account much. I originally created the account just by curiosity for their UI and now use it to log into other Google services and occasionally if I need a different account than one of my main ones. What I like about Gmail is, that it seems to have a quite good spam filter. In the past half year about 10 spams got through to my inbox, while more than 900 were filtered into the spam folder (in the past 30 days, if you believe Gmail).
So, what to do with those filtered spams? Deleting them just at once is a bummer because some cute mail marketers might have wasted hours in hacking web spiders, mailing scripts or Windoze trojans for bot networks. Therefore I decided to make some more use of the nice large collection in training the bayes filter of my Spamassassin with it. ;)
If you like to do the same, you just need a .fetchmailrc configured for Gmail and a small shell script that receives the spam from Gmail and makes Spamassassin learn it. The following are my settings and the script, which you could use as a starting point:
poll imap.gmail.com with
proto IMAP
user 'schlitt@gmail.com' there with password 'very very secret'
options keep ssl sslfingerprint '2E:52:DE:98:7F:07:A3:CB:43:9E:7B:77:51:60:0E:07'
This .fetchmailrc (note that I left my gmail address in there intentionally, I want more spam for training! ;) configures the gmail IMAP access through SSL. You need to use IMAP, since POP does not know about folders on the remote host and therefore will fetch mails from your inbox instead of the spam folder.
#!/bin/bash
/usr/bin/fetchmail -a -n -s \
--folder '[Gmail]/Spam' \
-m '/usr/bin/sa-learn -C /etc/spamassassin --no-sync --spam' \
| awk '/Learned tokens from 1 message/ { learned++; } /1 message(s) examined/ { all++; }\
END { print "Learned " learned " from " learned " messages. Thanks Google! ;)"; }'
/usr/bin/sa-learn --sync
This bash script can be used via CRON to fetch the spam mails from Gmail and inject them into the bayes filter for learning. Note that I use a global bayes database for my whole server and that this database must be writeable for the user who executes the CRON job. fetchmail calls the sa-learn command instead of a real MDA (thanks to this Spamassassin wiki page for the hint). Note, that the user executing this script also needs write access to the directory containing the bayes database files. The learning process creates a journal file, which is then synced into the database, there. To avod that a new journal is created for each mail and this journal is synched after each mail the --no-sync switch is used. I expect the synchronization re-calculates the probabilities in the database. The final sa-learn --sync switch makes the learning complete.
Looks like the spam in my Gmail account got a nice new employment. ;)
P.S.: I hope that the Spamassassing bayes does not use the To/CC/... headers for learning. Else all my Gmail ham might be classified as spam soonish, too. ;)
Thursday, November 22. 2007
I seized the boring time in the past days (I've had a really bad cold) and migrated my photo gallery to Flickr. So far I hosted the photos on my own, using Gallery. While Gallery is a great tool that allowed me to comfortablly manage and present my images, it has one great flaw: I host it by myself. That means a) that serving images from my server costs quite a bit of performance, if you involve PHP and even a database, and b) that I need to take care for updates myself. The first issue is a minor one, since my server is powerful enough to serve the job and not that many people are interessted. The second topic is much more disturbing, since I need to keep track about security updates all the time and need to bring an update in place ASAP, if one occurs.
Long story short, I played with the idea of migrating to Flickr for a longer time now and the only point that was keeping me back was the migration effort. Luckily a little Gallery module, called Gallery2Flickr, exists, which eases the job quite a bit. Therefore I seized the chance and performed the migration of all existing albums by now and you can find my image collection on Flickr, now. Sorry for destroying all the external photo links this way, anyway you can still reach my gallery on http://photos.schlitt.info and through the long way http://schlitt.info/applications/gallery.
Wednesday, October 10. 2007
After my server was close to wasting all its CPU time for checking email messages for potential spam using Spamassassin I decided that it was time to investigate. My friend Arne, who helped me a lot with Qmail problems earlier, recommended to install spamdyk, an SMTP spam filter that is placed in front of Qmail and does not require specially patches for the MTA itself. Spamdyk can filter mail by blacklisting, whitelisting, greylisting and using several other options.
Thanks to Arne for this great tip! Spamdyk is up and running now on my maschine and my load is now constantly below 0.40, while spam receival seems to be reduced drastically. Since I did not find much information about Spamdyk on Gentoo, I wrote down my experiences as a little howto in the Gentoo wiki. Maybe someone finds it helpfull. Any feedback welcome!
Monday, August 27. 2007
When I returned from vacation, which already wasn't as relaxing as expected due to priate issues, I directly returned to a huge mess, which you might already have been noticing by the downtime of phpunit.de website.
The first experience I had after booting my notebook was, that there did not seem to be any new emails in my private inbox. While this would usually be quite convenient after vacation, it made me kind of affraid, where the expected huge email load would have gone? SSHing into my private little server showed some interessting facts: First of, the load-indicator was at about 2500 (as Sebastian showed in his article already), a cool value in a way, if it is not your own server. The second fact was, that the /var partition had run full 4 days previously.
The latter issue was easily fixable: MySQL wrote binary logs of any change happening to a database. Don't ask me why, I did not (at least I'm not aware I did) switch this feature on, so it's possible a standard setting on Gentoo. Deleting these logs freed 6 GB of space on /var and made Lighttpd start up again to serve at least some of the websites. Sadly it did not serve phpunit.de, but more on that later.
Digging into my Qmail installation I noted, that about 5000 mails got stuck in the local queue, not being delivered (qmHandle helped a lot here). A closer look told me, that these were at least some of the messages I had expected in my inbox, which calmed me down a bit. Restarting Qmail and forcing it to flush its queues let my Thunderbird start downloading mails, so I left it for a while that way. When returning ah hour later, my inbox showed over 24000 emails. Wow! I had expected a huge lot, but that was a bit too much.
Guess what, each email was there at least duplicated, while most mails existed 7-10 times. I guess this was the effect of me trying out Qmailadmins auto-reply feature right before vacation. This test did not work as expected, so I switched the flag of again, being sure it would not do any damage. Still, even with having delivered a 24000 mails mess to me, Qmail had 5000 mails stuck... Those turned out to me mostly spam and bounces, so I decided to delete them from the queue directly and about 200 mails were left over from the 4 days with a full partition. Sounds Windozish, but a final reboot helped to flush those, too. I guess there was some Qmail process stuck, which kept the MTA to deliver further on, like a lock.
Sorting out all the bulsh** from my inbox took about the rest of Saturday, so I did not find time to take care of the webserver. On Sunday I spontaniously decided to join the geeks at FrOSCon, which was the right decision. Right now I notice, that I did not take the chance of shaking the hand of Henry Bergius, who was attending to give a talk at the PHP room we organized, because I was too busy discussion stuff with other people. Sorry for that, Henry, I hope we will meet again soonish!
Anyway, during some session Sebastian and me took the chance to check through Lighttpd, which still did not serve Sebastians site on PHPUnit. I have to admit, this was my fault actually: Short before vacation I updated the world of the server, because of some security issues I heard of. Since times were heavily busy, I forgot to take further care of the update, like restarting processes and running post-update tasks indicated by emerge. That way, Trac for phpunit.de was updated, but the necessary Python updates were incorrect, since I did not run python-updater on the new Python versions installed. Doing so finally fixed phpunit.de.
So, what did I learn from this mess? I guess 3 things: 1. If you change a running system, do it right and keep thinking of the update. 2. Monitor your server more closely, so that you see issues like full partitions before they get serious. 3. Vacation auto-reply is in general a bad idea! ;)
Anyway, the server is running smooth again and Sebastian was smiling all over the face, that phpunit.de is up and running again, while he looked over my shoulder and corrected this article.
So, after that messy weekend and the borked vacation before, it's time for me to get into eZ Components development again. I did not do much in the past 2 months in that field, but the following weeks are still university free, so you can expect some progress. Today I already caught up with about 400 mails in my eZ mail account so development can start right away tomorrow.
Sorry to everyone who missed phpunit.de last week! Since there is the still the slight possibility, that some mails were bounced our eaten, please email me again, if I did not reply to you, yet! Thanks for your appreciation!
Friday, July 28. 2006
Kore pointed me to a tool called BitlBee. Imagine that I am currently connected to 4 IRC networks and I am using Gaim for Jabber and ICQ. While Gaim is quite satisfying, it disturbs, that you have 2 communication clients online. Beside that, it disturbs me, that I have 1 tab per conversation open in gaim. On my server already runs a bouncer to allow me to keep me connected t my networks while I'm offline with my client.
Ok, now I'll have BitlBee running. BitlBee is a tiny daemon, that converts your instant messanger connections to IRC connections. Actually you are the connecting to a (local) IRC server, which is capable of handling your instant messanger connections. You then have a single IRC channel on that server, that shows all of your available instant messenger buddies as channel members. If you highlight 1 of those, he will receive the message. If you don't highlight anybody, you can send commands to the server.
That allows me to have all of my messaging buddies in 1 interface and to use the commands I'm already used to for any instant conversion. Fabulous!
Wednesday, March 22. 2006
No, not myself, but my server. Until now I ow a 1und1 Root Server L, which I bought more than 2 years ago. Now that 1und1 offers a new generation of servers, I decided to switch to a 64-bit machine. My old server had a Celeron processor, 256 MB RAM and 20 GB HD, which could not really handle my spam protection anymore. The new one is an Athlon 64 3000+, has 1 GB RAM and 2 80 GB SATA discs, which I run as a soft raid.
While migrating all of my stuff (28 domains, which I host for myself and friends) to the new server, I decided to switch the platform, I'm running, completely. While I still claimed a few weeks ago, that Debian is my favorite system for servers, I now have Gentoo running. I know, this is a system most people would never use for something else but desktops, but for one major reason it's better for me than Debian in this place: I'm more familiar with it! While I used Debian constantly more than 2 years on all of my machines, I got much too used to the Gentoo style of doing things in the past month and always tried stuff like "$ eix spamassassin" or "$ emerge -pv apache". Gentoo is simply cleaner and I have a much better overview on what my system has installed and what should be running.
When switching the system itself, I decided to go for other server software, too. While Postfix is a cool MTA, it's still hard to configure (naturally easier than Sendmail, but still hard) and it took me 2 weeks to figure out how stuff has to work. Since there is a very nice Gentoo-Howto for Qmail and it looked like this one is capable of everything I basically need for my personal playground, I went this way. Believe me or not, setting up my whole mail stuff (including virtual domains and accounts, spam and virii protection, mailing lists,...) took me 2 man days. Qmail, qmail-scanner, vpopmail, ezmlm, qmailadmin and maildrop give you a fantastically clean interface, to realize even complex architectures easily.
Beside my migration from Postfix to Qmail, I started (influenced by Kore) to use Lighttpd as my web server. Lighttpd is developed by Jan Kneschke and is a lighweight, easy to configure and secure web server. Most convenient, it uses the FCGI interface to address PHP, which is almost as fast as using Apache with mod_php, but gives you a huge bunch of flexibility. For instance I run 2 versions of PHP (4.4 and 5.1) in parallel inside 1 server, being able to define on a host or filename basis, which version to choose.
The migration is now almost complete and I'm very satisfied with the results. So long, thanks Qmail and Lighttpd! See some more info on my setup in the extended body of this entry.
Continue reading "Moving"
Monday, April 19. 2004
True, it's done.
I completly switched from local mail storage using POP3 deliverage to remote storage using IMAP4. I wanted to have that done for nearly 2 years now, but allways feared the effort. But during this 2 years I slowly got nearer and nearer to IMAP. First the switch from Win32 to Linux, then switching from Evolution to Thunderbird. And lately I just had to upload my mbox files to my server and the greatest effort was done.
After cleaning my mailbox (which was the greatest effort) I uploaded round 500 MB of mails. The cleaning was worth to purge about 50000 mails.
Now I can watch my whole mail storage from all over the world. I use uw-imapd (delivered by Debian) for the storage/fetching and procmail (with Postfix) for automatic sorting in folders.
Pretty nice! :)
Thursday, April 8. 2004
As every good geek I know, that backups are pretty necessary, but never made them. ;) True. The last backup of my local data (mails,...) is about 10 month old and my productive server configuration has even never been updated.
So, because of the recent problems with my workstations, I decided thatn risk is becoming to high and a backup solution is needed. Since my machines have all not so much disk space and backups on the same machine do not really make sense, I defined the following requirenments:
- Backups have to be saved in several stages (4 backups from the last 4 weeks)
- Backups have to go onto a remote machine.
- Individual directories have to be backupable.
- Disk space should be held as small as possible.
- Solution has to be so universal to be run on my server and workstation.
This little project gave me the possibility to get a bit deeper into bash scripting. And, do not get me wrong, I found out, bash is a pretty nifty language. Nice. :)
The solution I created is devided into 2 components (shell scripts) of which 1 has to run on the machine to backup (let's call it client here), the other one has to lie on the machine the backup goes to (in this spirit server). The latter one is called by the first one, which is triggered by cron.
Main parts of the scripts are 2 basic Unix/Linux tools:
- rsync
- cp
I give just a short overview on what the scripts do here, find the complete listings in the extended entry.
The client triggers a local script using cron, which at first connects to the server and rotates the backup spaces the backup the last created backup stage. After that, it call rsync for each directory configured to be backed up, what is pretty easy.
The rotation described is done in a more nifty way. The last recent backup (backup.0/) contains the real files of the backup. When it gets rotated, it is copied away to another directory (backup.1/), but not using a real copy, but let copy create hardlinks (cp -l). This has the effect, that no double disk space is allocated. When rsync overwrites one of the linked files in the backup.0/ dir now, it unlinks it before.
I don't know, if there are better solutions available, but this one is pretty straight forward for me and does everything I want to have in an optimal way. Feel free to leave your comments/enhancements/critics at this entry.
Continue reading "Backup needed"
|