*BSD News Article 34589


Return to BSD News archive

Xref: sserve comp.os.386bsd.questions:12493 comp.os.386bsd.misc:3285
Path: sserve!newshost.anu.edu.au!harbinger.cc.monash.edu.au!bunyip.cc.uq.oz.au!munnari.oz.au!news.Hawaii.Edu!ames!newsfeed.gsfc.nasa.gov!cesdis1.gsfc.nasa.gov!not-for-mail
From: becker@cesdis.gsfc.nasa.gov (Donald Becker)
Newsgroups: comp.os.386bsd.questions,comp.os.386bsd.misc
Subject: Re: Whats wrong with Linux networking ???
Date: 10 Aug 1994 17:09:07 -0400
Organization: NASA Goddard Space Flight Center -- Greenbelt, Maryland USA
Lines: 51
Message-ID: <32bflj$lig@cesdis1.gsfc.nasa.gov>
References: <Cu107E.Mz3@curia.ucc.ie> <3256t1$rbn@ra.nrl.navy.mil> <327nj0$sfq@sundog.tiac.net> <328fn2$i9p@news.panix.com>
NNTP-Posting-Host: cesdis1.gsfc.nasa.gov
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

In article <328fn2$i9p@news.panix.com>, Wayne Berke <berke@panix.com> wrote:
>bill@bhhome.ci.net (Bill Heiser) writes:
>
>>cmetz@sundance.itd.nrl.navy.mil (Craig Metz) writes:
>
>>>>- NFS was *slooow*
>>>	No arguing this one. Linux's NFS is still in need of serious work.
>
>>The *speed* of LINUX NFS isn't the real problem.  The reason it's so slow
>>is that by default it uses a 1K blocksize.  You can increase the rsize and
>>wsize to 8K, like Sun, and the performance improves dramatically.
>
>Could you explain why this should be the case?  Since 8K blocks will typically
>(eg. on an Ethernet) be fragmented by IP down to ~1K packets, why should these
>bigger blocks be an advantage.  If anything I would suspect that
>reassembly and retransmission costs would make the <MTU packets better.

Yes, the most rational block size for NFS transactions over ethernet is 1K.
But the larger blocks are an advantage in NFS because of the protocol
semantics and typical implementations.

The NFS protocol assures the client that when the write-RPC returns, the
data block has been committed to persistent storage.  For common
implementations that means the block has been physically queued for writing,
not just put in the buffer cache.  As you can imagine, this results in very
high latency.  This is compounded by the Sun NFS server implementation which
tuned for full-page 8K operations, since that is what other Suns typically
request.  Both larger or smaller(2) request result in major performance
drops.

This high latency translates directly to low performance when using a
straight-forward client implementation, like Linux, which writes 1K
blocks sequentially.  It waits for the first response before starting
the write of the second block.  You can get around this by writing a client
implementation that allows multiple outstanding write requests for each
writing thread, at the expense of write order inconsistency.

(1) NFS performance enhancers for Suns, like Prestoserve, stage the write in
battery-backed memory to reduce this latency.

(2) Writing smaller blocks on a Sun requires that an 8K page be read, the
new data inserted, and the block be written back to disk.  The alternative
of writing the new data directly to disk isn't used.
	


-- 
Donald Becker					  becker@cesdis.gsfc.nasa.gov
USRA-CESDIS, Center of Excellence in Space Data and Information Sciences.
Code 930.5, Goddard Space Flight Center,  Greenbelt, MD.  20771
301-286-0882	     http://cesdis.gsfc.nasa.gov/pub/people/becker/whoiam.html