Treading softly sucks.
I guess this one is mostly my fault, really.
A few days ago I decided that rigorous enforcement of SpamAssassin on all of my server users was a bit drastic, so retired the configuration and retreated to the hazy reality of receiving close to 40 junk emails per hour.
Typically this grew dull very quickly, at which point I reached for a former mainstay in my mail sanity, procmail.
Past experience with procmail has taught me that it's really easy to lose a lot of mail very quickly. Particularly if something in the configuration file is so hideous that procmail objects in such a foul fashion that the underlying MTA bounces the mail. So after fighting to transform my nice, easy to understand exim filter rules into hideous lists of symbols and regular expressions, I was ready to whip through a test run to see how it all went.
Common sense at this point caused me to put locking in, since retrieving 600-700 messages, eating two processes for procmail and one process for SpamAssassin per delivery and pausing for 5-10 seconds per message whilst various RBLs are consulted. Hence the head filter read:
# spam checking with pamsassassin - filter, so it adds headers :0fw: spamassassin.lock * < 262144 |/usr/bin/spamassassin
Sensible, no? Well, it depends on how you look at it. Running fetchmail with a batch limit of 50 messages just to test the water proved that delivering 50 messages takes a hell of a long time if each message eats up to 10 seconds just to check if it is spam or not. Never the less, it'll take a few hours to clear down 600-700 messages at up to 10 seconds per message - and the locking is in place, so I was confident it would pound away on it's own accord.
What I didn't think of is that exim just might think that, since procmail has been spawned and is just hanging around waiting for it's turn to run, it is taking just a wee while to get it's job done. And it's eating up around 2,100 processes in the, err, process.
So after a while it said "that's just taking too damned long. I'll bounce the message. *punt*".
91 punts later and I finally notice that the process table is shrinking too quickly for these to be properly delivered and kill exim, procmail and any SpamAssassin processes left running.
Assessing the damage, I now have 15 mailing lists to check I don't get booted off of in the next fortnight (or whenever the bounce processor/list admin gets around to checking bounces).
There's no moral to this story. Just pity. I will now instruct fetchmail to deliver using procmail from now on and ignore exim entirely for local fetching.
