*BSD News Article 84638

Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!news.mel.connect.com.au!munnari.OZ.AU!news.ecn.uoknor.edu!solace!dataphone!www.nntp.primenet.com!nntp.primenet.com!enews.sgi.com!news.sgi.com!newshub.sdsu.edu!news1.best.com!nntp1.best.com!usenet
From: dillon@flea.best.net (Matt Dillon)
Newsgroups: comp.mail.sendmail,comp.mail.smail,comp.unix.bsd.freebsd.misc
Subject: Re: Sendmail vs. Smail...
Date: 10 Dec 1996 06:33:53 GMT
Organization: BEST Internet Communications, Inc.
Lines: 115
Message-ID: <58j08h$1mn@nntp1.best.com>
References: <57tf61$gq7@raven.eva.net> <58be63$eu@stdismas.bogon.com> <58i45b$apn@nntp1.best.com> <58i81f$ajf@crystal.WonderWorks.COM>
NNTP-Posting-Host: flea.best.net
Xref: euryale.cc.adfa.oz.au comp.mail.sendmail:34974 comp.mail.smail:2672 comp.unix.bsd.freebsd.misc:32328

:In article <58i81f$ajf@crystal.WonderWorks.COM>,
:Kyle Jones <kyle_jones@wonderworks.com> wrote:
:>Matt Dillon <dillon@flea.best.net> wrote:
:> > [...]
:> >     * Turn ON ForkEachJob ... you don't really have a choice.
:> >       If you don't, the sendmail's running the queue can build
:> >       to five times their normal RSS and effectively run the
:> >       machine out of memory.  Unfortunately, turning
:> >       ForkEachJob off also blows the connection cache... oh
:> >       well.  Maybe a later version of sendmail will allow one
:> >       to specify how many jobs per fork one can run :-).
:>
:>You can combat this growth with smaller queues.  Take fifteen
:>thousand queued message and spread them over 150 directories and
:>queue runs aren't so bad.  The sendmail process runs out of jobs
:>before it gets really fat.  Limiting queue size is a good idea
:>anyway because of the linear directory searches combined with
:>directory update locking that can keep open() and unlink()
:>blocked for a long time.

    I've found that limiting the queue size causes some pieces
    of mail to sit in there for hours to perfectly valid destinations 
    while others breeze through in minutes.  I've pretty much
    given up on it as a means to limit queue runs.

    I've used the multiple-queue approach before too, but threw
    it away when MinQueueAge came out... you can get nearly
    the same effect simply by increasing MinQueueAge, or running
    X sendmail's with one MinQueueAge value and Y sendmail's with
    another.  The only reason I was using the multiple-queue approach
    at all was due to some catastrophic cascade failures due to
    linear searches of the spool direct by the kernel for 
    file create/remove.  MinQueueAge and the .hoststat stuff pretty
    much fixed it, and the problem went away entirely when we switched
    to FreeBSD.

    The huge queues also created problems for us when we were
    still using -q5m no, -q15m, no... -q30m, no ... :-)  It never
    worked.  Basically this sort of cascade failure occurs
    when the directory gets large enough such that -qXXXm 
    winds up starting more sendmail's then the system can deal
    with, all due to the large queue, and file create/remove starts
    to create clogged directories (processes sitting on filesystem
    locks trying to update the directory).  God, what a mess
    that was.

    The current system gets some pretty spectacular tests... 
    whenever the network to a particular huge.. actually very
    huge but unnamed provider barfs (heh), our mail queues shoot
    up at a rate of 3000 messages an hour.  The pagers start going 
    off when it hits 10,000 messages, and I start praying when 
    it passes 30,000.  It taught me a very important lesson: 
    NEVER, NEVER mount /var/spool/mqueue as it's own partition...
    you not only can't rename it, you can't clear the blocks
    allocated to the directory either!

    Oh, another good reason to run with ForkEachJob turned on...
    it allows you to kill sendmail 'nicely'... you simply kill
    the daemon and the children of the queue-running maintenance
    program...  the children of the children are the ones doing 
    the actual queue processing, and you let them finish up 
    their current queue file and exit normally.  Poof, you've 
    brought down your mail system without a single repeated
    message!  Nice!

:> >       There's a story here: We once received a mail bomb where
:> >       the bomber sent the entire message body as a header.
:> >       There were only about 50 of these messages in the mail
:> >       queue, but they caused the sendmail's running the queue
:> >       to grow to about 8 MBytes RSS.  The machine, with 128 MB
:> >       of ram, started to swap!  Holy cow!
:>
:>Groovy.  sendmail used to crash when faced with such headers,
:>freeing memory in the process.  Sometimes sendmail bugs are you
:>friends. :)

    Tell me about it!  There was one spam that had a badly munged
    address that caused sendmail to:

	* make connection to destination
	* send the email message to the destination
	* then crash before it could remove the queue file
	* repeat ...

    OUCH!  The real clincher:  It didn't update the queue file
    either, so MinQueueAge had no effect on the retries.  
    I call it the auto-remote-spamming tool.  I've got the
    blown up address save away in case I ever need to use 
    it on someone ;-)

:>
:> >       I've also tried to turn off ForkEachJob with the latest
:> >       8.8.4 release...  it doesn't work.  The sendmail's still
:> >       build up to around a 3 MB RSS and kill the machine.
:>
:>You might be able to get by with fewer queue run processes if you
:>run some of them with drastically smaller connection timeout
:>values.  If a host doesn't respond in 15 seconds. it probably
:>isn't going to respond at all.  Instead of (typically) waiting 60
:>seconds, give up and move on.  A complete pass of the queue takes
:>much less time, and responsive hosts are rewarded by getting
:>their mail delivered sooner.  The sluggish hosts will be serviced
:>eventually by the queue runs that use the RFC 1123 minimum
:>timeouts.  You should not let the fast queue runnners write into
:>the persistent host status cache, or slow hosts will never get
:>their mail.

    Ah, an interesting idea.  I may experiment with this some.
    .hoststat seems to have made the DNS delays an order of magnitude
    less invasive, and sendmail 8.8.x has some nifty options for
    connect and initial-connect timeouts.

					-Matt