*BSD News Article 81219

Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!munnari.OZ.AU!news.Hawaii.Edu!news.uoregon.edu!arclight.uoregon.edu!feed1.news.erols.com!howland.erols.net!cam-news-hub1.bbnplanet.com!uunet!in1.uu.net!twwells!twwells!not-for-mail
From: bill@twwells.com (T. William Wells)
Newsgroups: comp.unix.bsd.freebsd.misc
Subject: Re: FreeBSD as news-server??
Date: 20 Oct 1996 18:27:45 -0400
Organization: None, Mt. Laurel, NJ
Lines: 105
Message-ID: <54e911$24a@twwells.com>
References: <537ddl$3cc@amd40.wecs.org> <5462it$r37@twwells.com> <5467p6$bl4@flash.noc.best.net> <546dd4$bn7@flash.noc.best.net>
NNTP-Posting-Host: twwells.com

In article <546dd4$bn7@flash.noc.best.net>,
Matthew Dillon <dillon@best.com> wrote:
:     (b) Note the dead time.  There is one point 14:50:27 to 14:50:42 in this
:       particular sample where innd is 100% idle for 15 seconds!  (and, no,
:       innd was not swapped out :-)).  (this occurs a lot, but I am not
:       going to post thousands of lines of log files to prove it :-)).

I've noticed exactly this same dead time. However, on my system,
at least, the dead time is mostly illusory. Keep in mind that the
times in the log aren't the actual transaction time but the time
that innd got control from the select. Consistent with my
experience is the hypothesis that those dead times represent
merely how long it takes to process any given channel. This *will*
depend on how fast your disks are....

None of that is specially relevant to streaming vs. nonstreaming,
though. Look at it this way: if you have no streaming feeds, each
NNTP "step" takes time T. If you add streaming, you don't decrease
*any* T -- but you do increase it for those that happen to hit
during one of the streaming feeds, if the streaming feed
processing time is greater than the transaction time.

Thus you *will* see an effect on nonstreaming feeds. How much of
one? That'll depend directly on what fraction of the time you can
get in nonstreaming entries of time T. If that fraction is small,
you will find that the nonstreaming feed continually gets further
behind. If it's larger, it'll get delayed until increased dups
causes the times to balance again. As it gets larger, this delay
time will decrease until it gets lost in statistical noise.

:     * Until you get up to a dozen or more *full* feeds, the only thing
:       that counts are article-creation rates.  That is, the
:       'I already have it' response tends to be cached and therefore
:       ignorable.

As above, you're ignoring that systems with less speedy disks will
find that that dead time approaches the actual stream processing
time, starving out the nonstreaming feeds. (NB: it may not be
obvious but this starving occurs *before* you actually run out of
processing time. That's because it's a latency effect. If that
time spent in streaming were magically edited out of reality,
there would be enough time left over to do the nonstreaming
feeds.)

:     * That streaming mode is more efficient (cavet: in the face of
:       non-streaming mode, see below)... for several reasons.  It saves
:       TCP packets, it allows disk and network latencies to overlap, and
:       it allows statistically significant locality of reference to propogate
:       in the face of a large number of incoming feeds.

A foolish efficiency is the hobgoblin of little programmers. With
apologies to Emerson. :-) Anyway, the point is that that's an
irrelevant efficiency. News machines are almost invariably disk
limited. It's nice that streaming mode conserves CPU cycles but
not especially important.

For the overlap, sure, it's an improvements for streaming feeds --
for nonstreaming feeds sharing the same daemon, it can be a
disaster. The streaming feeds take so long that for many cycles,
there is no useful overlap.

And, for the locality of reference, this is plain hoghash.  Innd
does not place a heavy strain on the history file. Ten feeds
would mean 25 accesses to the history file/second, which simply
isn't going to be a problem. On a large server, however, there
will be a *lot* of reader activity. This will result in the
history file being flushed from the cache.

Locality of reference only helps if your feeds are very nearly
"real time". If your cache is 8M (it is, on my system), a history
block will live there until 8M of data have been read. That's not
very long, and if your repeated history references don't happen
within a few seconds of each other, you don't get any locality of
reference.

Streaming mode doesn't help here, either. That's because, unless
you're doing "takethis" (not likely with multiple streaming
feeds), you still send the "check" and the article in separate
transactions, meaning that innd has to go all the way around its
processing loop to handle it.

:     * That while non-streaming mode feeds will suffer, I suggest that the
:       dead time is sufficient to handle most lower-latency non-streaming
:       mode feeds.

It didn't help mine. Or a lot of other peoples'. :-)

:     My frank opinion is that everyone should run streaming-mode feeds.

And a number of people disagree, from the experience of running
streaming vs. multiple parallel feeds. The problem is that innd's
structure doesn't give "fair" allocation to all channels when
streaming is there. The original design was intended for lock-step
protocols like nonstreaming NNTP -- streaming breaks the design
assumptions, with a number of unhappy consequences. Multiple
parallel feeds have their own problems but are compatible with
innd's design assumptions, so work better.

:     Most real full-feed hubs use streaming nowadays anyway... it is not as if
:     you will have much of a choice.   In the last 12 months, all but one
:     of my incoming full feeds went from non-streaming to streaming.

In your part of the world, perhaps. In mine, several people have
switched from streaming to multiple parallel feeds because they
simply work better.