*BSD News Article 90050

Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!news.mel.connect.com.au!munnari.OZ.AU!news.ecn.uoknor.edu!feed1.news.erols.com!cpk-news-hub1.bbnplanet.com!su-news-hub1.bbnplanet.com!news.bbnplanet.com!www.nntp.primenet.com!nntp.primenet.com!news1.best.com!idiom.com!nntp2.ba.best.com!not-for-mail
From: dillon@flea.best.net (Matt Dillon)
Newsgroups: comp.programming.threads,comp.unix.bsd.freebsd.misc
Subject: Re: [??] pure kernel vs. dual concurrency implementations
Date: 24 Feb 1997 20:14:20 -0800
Organization: BEST Internet Communications, Inc.
Lines: 38
Message-ID: <5etous$j0l@flea.best.net>
References: <330CE6A4.63B0@cet.co.jp> <5etasa$blt@news.cc.utah.edu>
NNTP-Posting-Host: flea.best.net
Xref: euryale.cc.adfa.oz.au comp.programming.threads:3302 comp.unix.bsd.freebsd.misc:36064

    The kernel overhead for switching light weight tasks from an 
    interrupt, verses a user task doing (for example) a synchronous
    context switch between two threads managed in userland is
    the difference betweene 2 microseconds and 1 microsecond on a
    pentium pro 200.  I have *tested* this a number of times in
    my OS development research.  If you streamline that one critical
    path, you might as well do it in kernelland rather then userland
    where you have more flexibility when dealing with blocking system
    calls.

    What I haven't tested is the best-case context switch overhead
    that occurs when the MMU needs to be given a new VM context.  This
    does NOT apply to thread switching, however, unless you use a
    braindead 'must VM context switch the kernel stack because, gee,
    my fork code overlays the individual kstacks for the threads under
    any given process at the same address' approach... a very stupid
    approach, if you ask me.  I've written several OS's (for turnkey 680x0
    single board computers) that can switch user threads through a supervisor
    interrupt VERY quickly... on the order of 10uS on a 20 MIPS 680x0 cpu.

    Likewise, having per-task kernel stacks (in a user-stack/
    supervisor-stack/interrupt-stack model)  generally has much higher
    performance characteristics then a pure-kernel-single-istack model,
    mainly due to the fact that the kernrel is able to make synchronous
    context switches from supervisor mode without having to pop back
    to the top level callin procedure, then continue from where it left
    off on resume.  The ONLY reason people ever considered having a unified
    kstack was in an attempt to save memory.  That no longer applies... 
    4K or 8K per task is not a big factor any more, especially if it's 
    swappable.

    So, the jist is:  If you get away from the 'per-task kernel stack must
    always start at the same fixed address' methodology, kernel-switched
    user threads are *very* fast... 200,000 switches/sec or better, 1,000,000
    switches/sec under best-case test.

						-Matt