*BSD News Article 58315

Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!news.rmit.EDU.AU!news.unimelb.EDU.AU!munnari.OZ.AU!spool.mu.edu!howland.reston.ans.net!newsfeed.internetmci.com!in2.uu.net!van-bc!ddsw1!news.mcs.net!not-for-mail
From: les@MCS.COM (Leslie Mikesell)
Newsgroups: comp.unix.bsd.bsdi.misc,comp.unix.advocacy
Subject: Re: multiple httpds vs threads vs ... (was BSDI Vs. NT...)
Date: 27 Dec 1995 23:40:17 -0600
Organization: /usr/lib/news/organi[sz]ation
Lines: 90
Message-ID: <4btak1$6nv@Mercury.mcs.com>
References: <taxfree.3.00C439A1@primenet.com> <4bri4a$q2r@noao.edu> <4brlt5$8un@sungy.germany.sun.com> <4bsaa6$8l8@elf.bsdi.com>
NNTP-Posting-Host: mercury.mcs.com
Xref: euryale.cc.adfa.oz.au comp.unix.bsd.bsdi.misc:1866 comp.unix.advocacy:12708

In article <4bsaa6$8l8@elf.bsdi.com>, Chris Torek <torek@bsdi.com> wrote:

>>But is this still true if you are doing i/o?  That is, don't you
>>have to go from kernel to application address space on every
>>packet anyway?
>
>In typical machines, crossing a protection domain (kernel <-> user)
>is cheaper than changing an address space (userA <-> userB).  That
>is, if you have kernel threads, system calls made from different
>threads within one process can be switched faster than system calls
>from different processes.

But doesn't the kernel take the opportunity to decide if it is
time to run another process at that point anyway?  And in the
case of i/o bound processes isn't this a good thing?

>>Somewhere you've got to sort the packets out and decide which code
>>can run.  If this can be done better than the unix scheduler can
>>do it, why not fix that instead?
>
>This is not quite the right question, since the scheduler is solving
>a more general problem, but does bring up the point that if you
>have threads, you still have to schedule the threads, and this is
>just about the same job as scheduling processes.

Doesn't scheduling i/o bound processes automatically fall into
the same computation as delivering the data?  If you switch
in and out of a thread without doing any i/o, who's going to
know?

(In many systems,
>thread scheduling actually gets tied in with complicated `ganging'
>schemes designed to avoid livelock and starvation problems, so that
>the final system, with kernel threads, is much slower than the
>original process-only system ever was.  However, we can, for this
>discussion at least, wave our hands and claim that this is merely
>an implementation detail.

I don't see how, unless you find twiddling bits in memory to be
more interesting that moving them where someone can see them.
 
>One can inflict the same sludge on a
>process-based system that shares memory via mapping; its association
>with threads is mainly historical.  Fundamentally, the problem of
>scheduling something to run, whether it is a thread or a process,
>is pretty much the same, no matter where you move it.)

I don't see how this can be separated from the guts of read()/write().
Perhaps not every program is i/o bound, but mine all are.

>I also suggested the use of select() or poll(), and he asks:
>
>>Doesn't that just double your syscalls/second?
>
>This one is a bit complicated.  The short answer is `no'; in fact,
>because poll/select asks about a set of pending requests, the
>overhead actually goes *down* as the load increases.  Note that a
>system call costs about the same as an in-process context switch
>through the kernel, i.e., a kernel thread switch---most of the work
>is crossing the protection domain (e.g., saving and loading a few
>registers, then changing the CPU's access mode from `user' to
>`kernel').  So, in the light load case, yes, it does, but if there
>are N clients on one server, the polling system uses N+1 system
>calls to service them, while the threaded system uses N system
>calls and N switches.  If you get lucky, the switches combine with
>the system calls---they can be done together---and you have N+1
>versus N.  If the scheduler makes bad decisions, the polling system
>tends to outperform the threaded system, which makes up to 2N calls.
>In either case the difference tends to be marginal anyway.

Is there some invisible overhead in having many processes waiting
on read()/write() completion?  That is, why doesn't the kernel
end up performing essentially the same thing as the select()
anyway except that you don't have to go back after the wakeup to
actually perform the operation?

>(Incidentally, it sure is a lot easier to sit back and say things
>like: `ideally, this costs about the same as that' or `these should
>be more efficient than those' than it is to go out and measure them
>and try to explain or prove why the measurements do or do not
>conform to the theories.... :-) )

The catch is that measuring i/o bound operations doesn't tell you
much, and measuring without being i/o bound is even less useful.
The measurement you need is how much additional computation you
can perform with a steady i/o load happening.  But in most cases,
who cares?

Les Mikesell
  les@mcs.com