*BSD News Article 73490


Return to BSD News archive

Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!munnari.OZ.AU!spool.mu.edu!howland.reston.ans.net!nntp.crl.com!news.PBI.net!news.mathworks.com!hunter.premier.net!netnews.worldnet.att.net!cbgw2.att.com!nntphub.cb.lucent.com!news
From: "John S. Dyson" <dyson@inuxs.att.com>
Newsgroups: comp.os.linux.networking,comp.unix.bsd.netbsd.misc,comp.unix.bsd.freebsd.misc
Subject: Re: TCP latency
Date: Fri, 12 Jul 1996 09:44:59 -0500
Organization: Lucent Technologies, Columbus, Ohio
Lines: 87
Message-ID: <31E664EB.167EB0E7@inuxs.att.com>
References: <4paedl$4bm@engnews2.Eng.Sun.COM> <31E106AF.41C67EA6@dyson.iquest.net> <4rvmtf$ven@linux.cs.Helsinki.FI> <31E3D9E2.41C67EA6@dyson.iquest.net> <4s5bl2$qpg@linux.cs.Helsinki.FI>
NNTP-Posting-Host: dyson.inh.lucent.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 2.0 (X11; I; FreeBSD 2.1-STABLE i386)
Xref: euryale.cc.adfa.oz.au comp.os.linux.networking:44963 comp.unix.bsd.netbsd.misc:4004 comp.unix.bsd.freebsd.misc:23380

Linus Torvalds wrote:
> 
> In article <31E3D9E2.41C67EA6@dyson.iquest.net>,
> John S. Dyson <toor@dyson.iquest.net> wrote:
> >
> >One other thing, the numbers show that the DRIVER used on BSD is slower -- the
> >networking code is NOT SHOWN to be slower...  Refer to the numbers...
> 
> No, read the numbers again. Linux was faster on loopback too.
>
Given the same kernel compile options, that has not shown to be
true.  The difference of 20usecs is well within the range of them.

> 
> >Do you know that my localhost results on my P5-166 are 200usecs?
> >That is faster than the Linux measurements that are being espoused as a
> >"record" isn't it???
> 
> Ooohh.. "FreeBSD is faster over loopback, when compared to Linux over
> the wire". Film at 11.
> 
>         linux$ ./lat_tcp linux
>         $Id: lat_tcp.c,v 1.2 1995/03/11 02:25:31 lm Exp $
>         TCP latency using linux: 181 microseconds
> 
> That's on a P166 too.  With a stable kernel.  What were you saying
> again?
> 
Not the same machine :-(.  I see that percentage here is not as important
as the absolute latency is.  Seems like a pretty small difference to
me given a total reimplementation.  I guess alot of performance problems
are being fixed?  Hmmm...  Looks like the NEW IMPROVED Linux TCP suite
is about the same perf as the BSD code...  Luckily, there is movement
afoot to clean-up the BSD networking code, and I wouldn't be too awful
suprised if it betters Linux.  (Some pieces of it haven't been reworked
in years.)

> (And if you think you will get 10% better numbers by just changing
> compiler options, I'd suggest you _try_ it first, without spouting it on
> the newsgroups as facts with no backing).
> 
I get big differences on kernel compile options  (I have seen 10% or better
given -O vs. -O2 -fomit-frame-pointer, especially on code that uses lots of
registers.)  You are still not controlling the experiment.  Sigh...  Certain
kinds of operations show big differences.  One note, it is interesting that the
latency differences are the same "20usecs" on both benchmarks...

>
> And if you don't like latency numbers, what are your throughput numbers?
> (Btw, check your bcopy() speed first to see if the hardware really _is_
> comparable, see below)
> 
>         linux$ ./bw_tcp linux 50m
>         $Id: bw_tcp.c,v 1.3 1995/06/21 21:02:49 lm Exp $
>         Socket bandwidth using linux: 17.14 MB/sec
> 
I get about 17-19 MB/sec on localhost also on FreeBSD.  The MBUF
code is not very inefficient in reality.  Again, it is hard to
come to any conclusions given different hardware.

>
> Yes, the machine was idle while doing this. I guess you can just do them
> in parallell, though, to get _some_ idea about the degradation under
> load (admittedly not a lot of sockets, but at least some activity for
> context switches etc):
>
Geesh, do you understand that your example tests only three connections
to the same machine?  You are not showing scalability at all.  (Mostly
you are showing that you arent' busting the cache.)  The scalability
issues on the old Linux context switch didn't come into effect until
about 20processes did it?  Herein, you are showing that the localhost
code under very little if NO load runs the same speed (at least to me.)
But you STILL are not addressing the issue of scalability (especially
to/from multiple TCP/IP addresses.)

>
> (This machine does memory copies at 43MB/s - don't bother comparing to
> wildly different hardware: it's memcpy() bound.  I get 55MB/s on my
> alpha with the same kernel)
> 
I don't bother comparing OSes unless it is the SAME hardware...  (Actually,
I'll compare the results, but certainly NOT come to any conclusions.)  At
least you are trying to use more information than just NO-LOAD latency
to compare the TCP suites...  You ARE making progress.

John