*BSD News Article 80934

Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!munnari.OZ.AU!news.Hawaii.Edu!news.uoregon.edu!newsgate.cuhk.edu.hk!hpg30a.csc.cuhk.hk!news.cuhk.edu.hk!news.sprintlink.net!news-stk-11.sprintlink.net!www.nntp.primenet.com!nntp.primenet.com!howland.erols.net!news.mathworks.com!uunet!in3.uu.net!twwells!twwells!not-for-mail
From: bill@twwells.com (T. William Wells)
Newsgroups: comp.unix.bsd.freebsd.misc
Subject: Re: FreeBSD as news-server??
Date: 17 Oct 1996 00:05:49 -0400
Organization: None, Mt. Laurel, NJ
Lines: 155
Message-ID: <544bat$41o@twwells.com>
References: <537ddl$3cc@amd40.wecs.org> <53u1ic$61i@flash.noc.best.net> <53ucuj$8qh@twwells.com> <543urf$ar3@flash.noc.best.net>
NNTP-Posting-Host: twwells.com

In article <543urf$ar3@flash.noc.best.net>,
Matthew Dillon <dillon@best.com> wrote:
: :>Also, experience (and my theoretical analysis) shows that multiple
: :>parallel feeds generally work better than streaming.
:
:     Well, I've definitely never had a problem running streaming
:     mode, and I *have* tested it with and without.

As have I. My results are the exact opposite of yours. Sigh!

:     Perhaps the machine you were running it on wasn't tuned for it.
:     I find that you get much better results with larger TCP window
:     sizes... it tends to make the streaming much more efficient.

This is contrary to my expectations, if one is disk bottlenecked.
This, I suspect, is the difference between your system and mine.
The disks I use are pretty generic; I suspect that they're really
not suited to the task.

However, given the growth of feeds, what's true for my system
today is almost certainly going to become true for everyone else
in the not too distant future. Exponential growth does that. :-)

Anyway, the reason this is important is that if the overhead of
writing articles gets too large, it exceeds the ability of the
protocol to overlap it with network latency. Once that starts
happening, the protocol slows down dramatically and things like
increasing the TCP window size will only make things worse.

One other thing: you simply cannot run streaming and nonstreaming
feeds into the same server. Or, you can, but the nonstreaming
feeds will get so far behind as to be pointless. Even with fast
disks, this will be true....

: :>What this means is that optimizations regarding the history file
: :>are generally pointless. Keeping the history file in memory cuts
: :>out at most 8K per article of disk activity -- while INN spends
: :>time waiting on that 64K (it's mostly directory stuff, so INN
: :>doesn't get buffer cache benefits for it). Since these two
: :>operations can be done somewhat asynchronously, you don't get
: :>much "win" by minimizing history accesses.
:
:     Perhaps it is pointless with a single feed, but it certainly
:     is NOT pointless if you have multiple redundant feeds.

I have four feeds. My disk statistics really don't reflect your
opinion. Or, put it this way: if you have ten incoming feeds and
they all require a disk hit, that's twenty five disk hits per
second. This doesn't strain the disks at all....

:     History file caching is EXTREMELY important, because it means
:     that 6 out of 7 responses to IHAVE requests will be cached
:     (because the response is 'I've already got the article'), and
:     thus involve *NO* disk activity whatsoever.

Alas, this is only true if your feeds are all so close to "real
time" that things remain in the cache. Otherwise, caching doesn't
do anything for you. (In my system, I solve this problem with a
message id daemon, which eliminates most redundant history
lookups.)

:     There are a thousand things that cause create references
:     to unlinked history files... literally!

I don't have them. Ever. Maybe it's just luck. :-)

:     Without enough room
:     to manuever, any one of these items can completely destroy
:     your history file if you do not have enough free space on
:     the partition.

I haven't had that experience. The few times that I've had that
partition overflow during the daily history rebuild, expire
caught the overflow and just didn't bother renaming the files.
Thus the history wasn't rebuilt that day but it got rebuilt the
next. Which was good enough.

: :>But be sure to put the overview files in a separate directory tree
: :>-- otherwise overchan spends a lot of time directory searching.
:
:     The overview file is normally near the beginning of the directory.

No it is not. Because when you do an expireover, it makes a new
history file and renames it to the old one. There's no guarantee
that that will end up near the beginning of the directory.  In
fact, odds are pretty good it won't be.

:     Statistically, it's a wash.

Other people have completely different experiences. The INN
documentation also disagrees.

:     Besides, overchan is an asynchronous
:     process.  It does not really matter if it takes a little extra
:     overhead...

Yes it does. Because if innd can't buffer it, you get entries lost
into the batch file. Unless you go to pains to ensure that those
entries get processed, you end up with nnrpds wasting time
recreating those entries.

:     it's in the noise because the directory in question
:     has *already* been cached by the act of writing out the article
:     file in the first place.  The namei caching works for .overview
:     files as well.

Alas not, because overchan is asynchronous. By the time it's
ready to fiddle with the overview file, that directory stuff is
likely to be long gone.

:     I'm beginning to wonder.... what kind of hardware are you
:     running this stuff on?

I described it in another post....

: :>As I said, I don't think this makes much difference anymore. For
: :>sure, on the system I have, it makes things *much* worse to have
: :>a large data segment for innd.
:
:     Any UNIX that implements vfork() will not care at all, and FreeBSD
:     doesn't care whether you use fork() *or* vfork().  It's a big zero
:     time-wise, even with huge data segments.

Irrelevant because, even if FreeBSD doesn't copy or write the
data, it _does_ allocate swap space. Get a bunch of these all at
once and your server will refuse to fork. There are certain news
clients which have a bad habit of making large numbers of nntp
connections all at once. This makes random things fail on the
server.

:     You are only running a few nnrpd's, it doesn't matter.  But
:     shared-active saves a huge amount of startup processing plus,

Good point.

:     if
:     you have a full active file, on the order of the size of the
:     active file (500K to 1MB usually) per nnrpd process.

Yeah. For awhile, I was running a hacked nnrpd that read the
active file for each new newsgroup the user wanted. 300K images.
:-) Only problem was some newsreaders that positively insisted on
looking up several hundred newsgroups all at once, so I
regretfully had to retire that hack.

: :>screw you up memory-wise. Basically, it's a bad idea to run
: :>channel feeds. For that matter, I think I'm going to remove the
: :>last of mine (for overview). Then innd will *never* fork -- and
: :>that's one less thing to get in the way of shovelling articles as
: :>fast as possible. :-)
:
:      Hmm.. file batching overview :-) :-) :-)

No particular reason not to. It certainly works for C news and
the nnrpds will deal with records not in the file quite yet.