*BSD News Article 73927

Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!munnari.OZ.AU!spool.mu.edu!usenet.eel.ufl.edu!gatech!news.mathworks.com!uunet!in2.uu.net!news.artisoft.com!usenet
From: Terry Lambert <terry@lambert.org>
Newsgroups: comp.os.linux.networking,comp.unix.bsd.netbsd.misc,comp.unix.bsd.freebsd.misc
Subject: Re: TCP latency
Date: Tue, 16 Jul 1996 12:34:23 -0700
Organization: Me
Lines: 119
Message-ID: <31EBEEBF.5E0B4E7E@lambert.org>
References: <4paedl$4bm@engnews2.Eng.Sun.COM> <31E7C0DD.41C67EA6@dyson.iquest.net> <4s8tcn$jsh@fido.asd.sgi.com> <31E80ACA.167EB0E7@dyson.iquest.net> <4sadde$qsv@linux.cs.Helsinki.FI> <31EA9FBC.41C67EA6@star-gate.com> <DuLzKz.Fsy@kroete2.freinet.de>
NNTP-Posting-Host: hecate.artisoft.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 2.01 (X11; I; Linux 1.1.76 i486)
Xref: euryale.cc.adfa.oz.au comp.os.linux.networking:45449 comp.unix.bsd.netbsd.misc:4081 comp.unix.bsd.freebsd.misc:23760

Erik Corry wrote:
] : > In article <31E80ACA.167EB0E7@dyson.iquest.net>,
] : > John S. Dyson <toor@dyson.iquest.net> wrote:
] : > >I think that was a kind-of cute situation.  We decided NOT
] : > >to special case the syscall that Larry uses for the
] : > >null-syscall case.
] 
] I think what John wrote above can only be interpreted as a complaint
] that Linux has a special case for the null syscall. I certainly
] interpreted it that way, so did Linus, and so did most people
] reading the message. If nobody special-cased the null syscall, why
] bring it up at all.

I interpreted it to mean that Larry picked a poor system call
in John's opinion because of the inherit VFS/VOP implementation
bias in using a zero length write to /dev/null, as opposed to
some other system call which did not measure FS layer overhead
as well.

In addition, since I am significantly better informed on the
BSD FS internals than your average code hack, I understand
the misinterpretation of the statement.

However, I can insure you it was a misinterpretation.


The /dev/null zero length write invokes the vfs_syscalls.c,
then the vnode_if.c then the spec_vnops.c, tthen a structurally
bogus lock/unlock pair, surrounding a call through the cdev.


While it may be useful to measure VFS interface overhead, which
you would do by zero writing a block device instead of a character
device, and subtracting out the system call overhead:

	case 
                VOP_UNLOCK(vp, 0, p);
                error = (*cdevsw[major(vp->v_rdev)]->d_write)
                        (vp->v_rdev, uio, ap->a_ioflag);
                vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, p);
                return (error);

        case VBLK:
                if (uio->uio_resid == 0)
                        return (0);
                if (uio->uio_offset < 0)
                        return (EINVAL);
 
and NOT descending into the devswitch (which is going away soon
anyway, along with specfs itself), a null write in this fashion
is far from simply system call overhead, which is what we are
purporting to measure with the test.

In case it isn't obvious, pretesting the uio in the VCHR case,
as in the VBLK would be one way to special case the call, as
John suggested, and pretesting the uio before descending into
the VOP calls through vnode_if.c at all, would be the other.

Both of these would significantly "improve" the BSD "performance"
on this "benchmark".

If this were applied at the system call layer, it's pretty obvious
that the MSDOSFS semantics of "zero length write is set EOF" could
not be supported through the interface.



] It looks from this as if John thinks the reason Larry benchmarks
] the null syscall is that Larry thinks people want to do thousands
] of null syscalls per second. Of course the null syscall isn't
] important, it's just a way of measuring the syscall overhead when
] you make a useful syscall. And that (I hope everyone can agree) is
] an interesting figure.

It *is* an interesting figure.  It just isn't the figure that is
returned by this particular choice of system call in the BSD case,
and thus you can not compare the Linux and BSD values as "system
call overhead", which this test purports to do by the labelling
of its output.


] If John thinks there's a better (historical?) way to test that
] overhead he doesn't say what it is.

Yes, and "shame on John" for this.  He's doing what I usually do,
which is assuming a significant amount of context: in this case, a
knowledge of the BSD FS call path implementation to know whether
that particular call is a good one for measuring what it is
purporting to measure.


My personal suggestion would be something like setgid(), and toggle
back and forth between groups (to avoid optimistic caching, in case
you were wondering).

This could still invoke group validation instead of simple call
overhead, so you should be aware of the implementation on the
system you are testing.

I can't in good conscience suggest getpid(); as has been pointed
out, it is a poor NULL system call, since a correct implementation
would perform user space caching in the library (and implement
cache coerency in the child side of the fork call in the library).

Other calls are subject to skew for reasons other than caching.

Probably the best bet would be an agreed upon "null system call"
kernel entry to cause the system call turn around to be the *only*
thing measured.  You could do this on most moder systems (even down
to SRV3) if you are willing to write a loadable system call (which
you can do in SVR3 if you are clever at all).


					Regards,
                                        Terry Lambert
                                        terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.