*BSD News Article 17111


Return to BSD News archive

Newsgroups: comp.os.386bsd.questions
Path: sserve!newshost.anu.edu.au!munnari.oz.au!uunet!psgrain!percy!agora!implode!davidg
From: davidg@implode.rain.com (David Greenman)
Subject: Re: Ethernet [really TCP] performance measurement
Organization: Delta Systems, Portland, OR.
Message-ID: <C8L7A8.1v1@implode.rain.com>
Date: Mon, 14 Jun 1993 01:10:55 GMT
Lines: 152

Garrett Wollman writes:
>You're probably comparing it with a 16-bit Ethernet adapter that has a
>reasonable amount of memory.  Me no got.  The AT&T EN100 is an 8-bit
>adapter (i.e., SLOW SLOW SLOW) with only 16k of memory.  (I am certain
>that I could handle 16k TCP windows if only I had 64k of memory...)

The SMC 8013 is a 16 bit card, but it has just 16k of memory. As far as why
you can't handle 16k windows - its because of a either a bug in the malloc
code, or a bug in the driver. In the case of the malloc code, the problem
is partially fixed in the patchkit. In the case of the driver, the original
if_wd driver and the original if_ec driver had a bug that would crash the
machine under heavy load. My new driver fixes these problems, and also deli-
vers full ethernet performance with the 16bit WD/SMC boards. I'm not sure why
Amancio has the 'low' performance of 'only' 985k/second, but considering
how different our kernel sources are (Amancio uses NetBSD, I'm using
"David's BSD"), this doesn't surprise me that much. Attached is a draft
version of my driver's release notes. BTW, ttcp performance to localhost
on my machine:

ttcp-t: buflen=8192, nbuf=2048, align=16384/+0, port=5001  tcp  -> localhost
ttcp-t: socket
ttcp-t: connect
ttcp-t: 16777216 bytes in 5.40 real seconds = 3034.52 KB/sec +++
ttcp-t: 2048 I/O calls, msec/call = 2.70, calls/sec = 379.32
ttcp-t: 0.0user 3.3sys 0:05real 61% 0i+0d 0maxrss 0+0pf 2536+43csw

The TCP window is set to 8k, 'implode' is a 486DX2/66. Now this is 20 times
what you reported. 386DX/20's aren't _that_ slow. My little 386SX/25 gets
about 600K/second. You really should have a look at your kernel source.

-DG

---
David Greenman
davidg@implode.rain.com

--------------------------------------------------------------------------

		Release Notes for 'ed' Device Driver
		    David Greenman, 24-May-1993
		------------------------------------


INTRODUCTION
------------
   The 'ed' device driver is a new, high performance device driver supporting
Western Digital/SMC 80x3 series (including 'EtherCard PLUS' 'Elite16') and the
3Com 3c503. All of the ethernet controllers use the DS8390 or 83C690 Network
Interface Controller (NIC). The differences between the boards are in their
memory capacity, bus width (8/16 bits), and special logic (asic) used to
configure the shared memory and other things. Every effort has been made to
conform to the manufactures' specifications for the NIC and asic. This includes
both normal operation and error recovery.

PERFORMANCE
-----------

transmit
--------
   The 8390 doesn't provide a mechanism for chained write buffers, so it is
very important for maximum performance to queue the next packet for
transmission as soon as the current one has completed. On boards with 16k or
more of memory, the shared memory is divided in a way that allows enough space
for two full size packets to be buffered for transmission. When sufficient
data is available for transmission, a packet is copied into the shared memory,
the transmission is started, and then an additional packet is copied into the
shared memory (to a different memory area). As soon as the first packet has
completed, transmission of a second packet can then be started immediately -
in less time than the 9.6uS interframe gap. This results in the highest
performance possible from ethernet.

Packets go out on the 'wire' with the following format:

preamble  dest-addr  src-addr  type      data      FCS   intr-frame
64bits     48bits     48bits  16bits   1500bytes  32bits   96bits

   With 10Mbits/sec, each bit is 100nS in duration. All of the above fields,
except for data are of fixed length. With full sized packets (1500 bytes), the
maximum unidirectional data rate can be calculated as: 6.4uS + 4.8uS + 4.8uS +
1.6uS + 1200uS + 3.2uS + 9.6uS = 1230.4uS/packet = 812.74382 packets/second =
1219115.7 (1190k) bytes/second. With TCP, there is a 40 byte overhead for the
IP and TCP headers, so there is only 1460 bytes of data per packet. This
reduces the maximum data rate to 1186606 bytes/second. With TCP, there will
also be periodic acknowledgments which will reduce this figure somewhat both
because of the additional traffic in the reverse direction and because of the
occasional collisions that this will cause. Despite this, the data rate has
still been consistantly measured at 1125000 (~1100k) bytes/second through a TCP
socket. In these tests, the TCP window was set to 16384 bytes. With UDP, there
is less overhead for the headers, and with 1472 bytes of data per packet, a
data rate of 1196358.9 (1168k) bytes/sec is possible. UDP performance hasn't
been precisely measured with this device driver, but less precise tests show
this to be true (measured at around 1135k/second).

receive
-------
   The 8390 implements a shared memory ring-buffer to store incoming packets.
The 8bit boards (3c503, and 8003) usually have only 8k bytes of shared memory.
This is only enough room for about 4 full size (1500 byte) packets. This can
sometimes be a problem, especially on the original WD8003E and 3c503. This is
because these boards' shared memory access speed is also quite slow compared
to newer boards - typically only about 1MB/second. The additional overhead of
this slow memory access, and the fact that there is only room for 4 full-sized
packets means that the ring-buffer will occassionally overflow. When this
happens, the board must be reset to avoid a lockup problem in early revision
8390's. Resetting the board will cause all of the data in the ring-buffer to
be lost - requiring it to be re-transmitted/received...slowing things even
further. Because of these problems, maximum throughput on boards of this type
is only about 400-600k per second. The 16bit boards (8013 series), however,
have 16k of memory as well as much faster memory access speed. Typical memory
access speed on these boards is about 4MB/second. These boards generally have
no problems keeping up with full ethernet speed. The only problem I've seen
with these boards is related to the (slow) performance of 386BSD's malloc code
when additional mbufs must be added to the pool. This can sometimes increase
the total time to remove a packet enough for a ring-buffer overflow to occur.
This tends to be highly transient, and quite rare on fast machines. I've only
seen this problem when doing tests with large amounts of UDP traffic without
any acknowledgments (uni-directional). Again, this has been very rare.

   All of the above tests were done using a 486DX2/66, 486DX/33, 386DX/40,
8-9Mhz ISA bus, with Bruce Evans' high speed spl()/interrupt modifications, a
high performance version of in_cksum.c from Bakul Shah, with tcp_sendspace/
tcp_recvspace set to 16k, and MCLBYTES set to 2048 (to allow full MTU sized
packets). TCP tests were done with the 'ttcp' performance test utility, and
also with FTP client/server. UDP tests were done with a modified version of
ttcp (to work around a bug in 386BSD's UDP code related to queue depth), and
also with NFS.

KNOWN PROBLEMS
--------------

1) Early revision DS8390B chips have problems. They lock-up whenever the
	receive ring-buffer overflows. They occassionally switch the byte order
	of the length field in the packet ring header (several different causes
	of this related to an off-by-one byte alignment) - resulting in "shared
	memory corrupt - invalid length NNNNN" messages. The board is reset
	whenever these problems occur, but otherwise there is no problem with
	recovering from these conditions.
2) 16bit boards can conflict with 8bit BIOS or BIOS extensions (like the VGA).
	There is a work-around for this in the driver, however. The problem is
	that the ethernet board stays in 16bit mode, asserting its '16bit' signal
	on the ISA bus. This signal is shared by other devices/ROMs in the same
	128K memory segment as the ethernet card - causing the CPU to read the
	8bit ROMs as if they were 16bit wide. The work-around involves setting
	the host access to the shared memory to 16bits only when the memory is
	actually accessed, and setting it back to 8bit mode all other times.
	Without this work-around, the machine will hang whenever a reboot is
	attempted.
3) 16bit 3c503 boards seem to have a memory contention problem between the NIC
	and the on-board shared memory that causes periodic fifo-overruns. I'm
	looking for a work-around, but haven't found one yet. When this problem
	occurs, the 3c503 drops the packet. Other than re-tranmissions/reduced
	performance, no other effects of this problem have been seen.