*BSD News Article 92219


Return to BSD News archive

Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!munnari.OZ.AU!news.Hawaii.Edu!news.caldera.com!enews.sgi.com!news.corp.sgi.com!fido.asd.sgi.com!neteng!lm
From: lm@neteng.engr.sgi.com (Larry McVoy)
Newsgroups: comp.unix.bsd.freebsd.misc,comp.unix.bsd.bsdi.misc,comp.sys.sgi.misc
Subject: Re: no such thing as a "general user community"
Followup-To: comp.unix.bsd.freebsd.misc,comp.unix.bsd.bsdi.misc,comp.sys.sgi.misc
Date: 29 Mar 1997 02:33:24 GMT
Organization: Silicon Graphics Inc., Mountain View, CA
Lines: 93
Message-ID: <5hhv1k$jh9@fido.asd.sgi.com>
References: <331BB7DD.28EC@net5.net> <5g9hjp$api@flea.best.net> <5gmb58$6jd$1@news.clinet.fi> <5gn3ig$83d@flea.best.net> <5goqrq$5ak$1@news.clinet.fi> <5hd29s$e7t@fido.asd.sgi.com> <333C1614.ABD@sgi01.grn.aera.com>
Reply-To: lm@slovax.engr.sgi.com
NNTP-Posting-Host: neteng.engr.sgi.com
X-Newsreader: TIN [version 1.2 PL2]
Xref: euryale.cc.adfa.oz.au comp.unix.bsd.freebsd.misc:37950 comp.unix.bsd.bsdi.misc:6487 comp.sys.sgi.misc:29486


Lee Ward (lee@sgi01.grn.aera.com) wrote:
: > XFS includes striping and mirroring.  Striped XFS file systems move data
: > at 500MB/sec.  They have done so for years.
: > 
: > XFS is integrated with SGI's NFS.  SGI's NFS has an extension, a freely
: > available under the GPL extension, that delivers 85MB/sec read or write
: > rate, over the wire, 1 process, single threaded, no async I/O.  In other
: > words for (whatever) { read(fd, buf, 1<<20); }.
: > 

: These numbers seem pretty wild to me. How many hundreds of thousands of
: dollars worth of hardware are we talking about here? I've been working
: with super-computers on and off for the last few years - many SGI
: machines included. I've never seen an SGI deliver even the above claimed
: NFS rate on a local file system. While it seems that it *could* be done,
: I've just never seen it. I realize this is only anecdotal but it seems,
: to me, as valid as the original unsupported statement.

Anecdotal my butt.  I do this all the time on my lab machine.  It's in the
midst of an install right now, but here are some old notes on performance
from release 1.0 of BDS.  BDS is an SGI extension to NFS for bulk data
movement.  It uses an additional TCP socket per open file when files
are opened with O_DIRECT.  O_DIRECT is an open flag that tells the OS
to go fast, you might think of it as similar to madvise(SEQUENTIAL...).

Example configurations with performance results
-----------------------------------------------

    The following configurations used IBM 2GB drives, BDSpro 1.0, and
    Hippi.  There were 3 disks per controller (fast&wide 20MB/sec SCIP
    card SCSI controllers), the transfer sizes were exactly the stripe
    width, and the results are in MB/sec.  A MB here is the size type,
    i.e., 1024*1024; all numbers are 4% too small if you like the 10^6
    definition of a MB.  Expect much lower results if you use sizes
    smaller than the stripe width, slightly lower results if you use
    transfer sizes that are unaligned with respect to the stripe
    width.  The BDS writes are slower than XFS largely because of the
    synchronous nature of BDS writes.

    All numbers were measured with lmdd, a part of lmbench.

  Disks  stripe  stripe          XFS              BDS      
           unit   width    read    write     read    write         
     27    128k   3456k      99       64       72       35

And I can do that on a uniprocessor R10K system.  I dunno what our disk
prices are today, I'm sure they are too high in everyone except SGI's
opinion, but suppose a disk costs $3K.  That's about $90K in disks and
mebbe $40K in system (should be cheaper but you need a hippi board and
we charge for those).  Hardly hundreds of thousands of dollars.

The numbers above are from BDS 1.0.  We've been working on it and we've
fixed a few things in 2.0.  We can get to 85MB/sec now and on both reads
and writes.  That's just for one HIPPI connection.  We can do lotso 
HIPPI connections.  And it scales.

: Similarly, for a small group taking a small news
: feed, it may be approrpiate to use a lower cost PC than any of the SGI
: offerings. Then again, it may not. It seems to me that it would be
: useful to weigh the benefits and costs.

You bet.  If I was at a startup and needed a nameserver would I buy a $10K
SGI when I could do it with a $1K PC?  Hell, no.  You would be stupid
to do that.  But let's look at that.  I'm a kernel hacker.  There is
nothing in the system utilities or the kernel that I could not rewrite
or bug fix as needed.  Not that I want to, but if I had to I could.
Linux, FreeBSD, they are all just big C programs and I do C for money.

For some people, the $10K is actually a better deal.  They pay the money
and insist that it works.  Our buddy Matt will say that SGI sucks and just
don't work, and sometimes I agree with him, but a lot of customers do get
useful work done with SGI systems.  Matt has larger needs than most and
he wants to get more out of a chunk of hardware than most.  I respect
his ability to clearly know what the hardware should do.  We tend to
make it get withing 10 or 20% of that limit.  Matt wants 100%, will be
happy with 95%, starts getting annoyed at 90%, and is furious at 80%.
If all our customers were like Matt, we would fold up the tent.  Most
customers know that most vendors don't hit a 100% of what the hardware
can do for all possible applications and they buy a little margin.  To
each their own.  

: What is "real" work anyway?

If you have to ask you don't know.  Real work is not stuff that
works well on an Xterminal.  An amazing number of workstations are
glorified Xterminals.  Real work is rebuilding your kernel in a minute.
Running your datawarehouse.  Serving up a few million web queries.
Real work frequently doesn't fit on a PCI bus or in a $200 motherboard
with flakey parts.
--
---
Larry McVoy     lm@sgi.com     http://reality.sgi.com/lm     (415) 933-1804