*BSD News Article 81037

Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!munnari.OZ.AU!news.Hawaii.Edu!news.uoregon.edu!hunter.premier.net!www.nntp.primenet.com!nntp.primenet.com!news.sprintlink.net!news-peer.sprintlink.net!uunet!in1.uu.net!twwells!twwells!not-for-mail
From: bill@twwells.com (T. William Wells)
Newsgroups: comp.unix.bsd.freebsd.misc
Subject: Re: FreeBSD as news-server??
Date: 17 Oct 1996 15:48:45 -0400
Organization: None, Mt. Laurel, NJ
Lines: 114
Message-ID: <5462it$r37@twwells.com>
References: <537ddl$3cc@amd40.wecs.org> <543urf$ar3@flash.noc.best.net> <544bat$41o@twwells.com> <544nas$b5h@flash.noc.best.net>
NNTP-Posting-Host: twwells.com

In article <544nas$b5h@flash.noc.best.net>,
Matthew Dillon <dillon@best.com> wrote:
: :>One other thing: you simply cannot run streaming and nonstreaming
: :>feeds into the same server. Or, you can, but the nonstreaming
: :>feeds will get so far behind as to be pointless. Even with fast
: :>disks, this will be true....
:
:      Well, the article writing overhead *could* be decoupled relatively
:      easily from INND.  It would be a 'one-hour hack' in programming
:      terms.  You just pipe the data to another process and go on to the
:      next article.

Even if you do this, I think the the nonstreaming feeds will
still get crunched. Innd processes the stuff from each stream all
at once, so this introduces latency in all the other feeds.  The
is a *real* problem for nonstreaming feeds because any latency
above the network latency directly slows the nonstreaming feed.
(This is, in fact, the problem that streaming was invented to
solve....)

: :>Alas, this is only true if your feeds are all so close to "real
: :>time" that things remain in the cache. Otherwise, caching doesn't
: :>do anything for you. (In my system, I solve this problem with a
: :>message id daemon, which eliminates most redundant history
: :>lookups.)
:
:     You can cache a *lot* of history file.  Sure, the cache will not be
:     as optimal, but it will still be there, and it will be a disk read
:     rather then a file create.

Well maybe. There's an awful lot of disk activity going on on a
news server and most of it isn't in the history file.  This one, I
suspect, isn't going to be answered without tools that can
examine the buffering directly.

: :>:     There are a thousand things that cause create references
: :>:     to unlinked history files... literally!
: :>
: :>I don't have them. Ever. Maybe it's just luck. :-)
:
:     I didn't have such problems either until my active nnrpd's went
:     over 100.  As with many other things, it isn't a problem until
:     your statistical sample is large enough and then something goes
:     slightly wrong.  Boom!

Well, like I said, even if I do get overflows from expire, the
system recovers gracefully. Of course, the right thing to do is
to make nnrpds periodically close the history file. That's a
"good thing" for several reasons....

:     It is near the beginning of the directory.  Hey!  This should be easy
:     to prove!  I'll write a little program that scans the directory and
:     tells me what slot .overview is in. hold on....

Hm. Looks like FreeBSD "does the right thing" in a rename and
reuses the file's slot. Smarter than I thought! :-) However, for
those stuck with link/unlink to simulate rename, I'd expect the
overview file to move up a few blocks, if not all the way to the
average end of the directory.

: :>Yes it does. Because if innd can't buffer it, you get entries lost
: :>into the batch file. Unless you go to pains to ensure that those
: :>entries get processed, you end up with nnrpds wasting time
: :>recreating those entries.
:
:     Huh?  I have no idea what you are talking about here.  nnrpd does not
:     go around creating .overview entries.   It's asynchronous, and it has
:     no effect whatsoever on innd unless it gets behind.  I have NEVER seen
:     overchan get behind... ever... the system could be dying and overchan
:     still wouldn't get behind.

Ok, here's what happens. If overchan gets behind, innd starts
creating a batch file for it. That goes in your out.going
directory. This *does* happen and did happen for me until I moved
the overview to a separate disk. This batch file doesn't ever get
processed. Thus some entries are lost from the overview.  This is
not a catastrophe: nnrpd considers the spool to be the master; if
there is a file in the directory which doesn't have an overview
entry, it creates one on the fly. This entry is *not* written to
the overview file, it's purely internal to the nnrpd. The "wasting
time" I was referring to is the time to open the articles with
missing entries and read them for overview data.

: :>Alas not, because overchan is asynchronous. By the time it's
: :>ready to fiddle with the overview file, that directory stuff is
: :>likely to be long gone.
:
:     This is not true at all.  A 4K buffer is equivalent to less then
:     a hundred articles.  It's still cached.  We aren't taling about hour
:     delays here, or even 5 minute delays.  We are talking about 30 seconds
:     of delay here.

Well, I can't show you directory statistics anymore (because of my
directory structure changes) but when your popular directories
are hundreds of k's and you have a lot of nnrpds floating around
reading from the disk, the cache turnover is pretty damned fast.
This is another one of those where it would be nice to instrument
the cache....

: :>Irrelevant because, even if FreeBSD doesn't copy or write the
: :>data, it _does_ allocate swap space. Get a bunch of these all at
: :>once and your server will refuse to fork. There are certain news
: :>clients which have a bad habit of making large numbers of nntp
: :>connections all at once. This makes random things fail on the
: :>server.
:
:     No, FreeBSD does not allocate swap space.  Lookee here, program #2:

OK, then *you* tell *me* what EAGAIN from fork means. :-) When I
checked the kernel code, it looked like nothing short of a swap
shortage would cause it. (Well, running out of slots for child
processes could, too, but I don't think that's the case here.
Eveen at peak times, I've still got about 50% leeway in my process
slots before I hit the limit.)