*BSD News Article 7678

Xref: sserve comp.unix.bsd:7728 comp.benchmarks:2359 comp.arch:28082 comp.arch.storage:680
Newsgroups: comp.unix.bsd,comp.benchmarks,comp.arch,comp.arch.storage
Path: sserve!manuel.anu.edu.au!munnari.oz.au!news.hawaii.edu!ames!saimiri.primate.wisc.edu!zaphod.mps.ohio-state.edu!darwin.sura.net!sgiblab!sgigate!sgi!igor!jbass
From: jbass@igor.tamri.com (John Bass)
Subject: Re: Disk performance issues, was IDE vs SCSI-2 using iozone
Message-ID: <1992Nov10.170022.21624@igor.tamri.com>
Organization: DMS Design
References: <1992Nov7.102940.12338@igor.tamri.com> <36794@cbmvax.commodore.com>
Date: Tue, 10 Nov 92 17:00:22 GMT
Lines: 136

From: jesup@cbmvax.commodore.com (Randell Jesup)
>	You're right about the early SCSI days - for example, scsi got zone
>recording earlier, etc.  The base technology in IDE/SCSI is now the same, so
>it comes down to software and interface issues.  I'll agree that for a single-
>disk system and small transfers, IDE has a slight margin in speed over SCSI,
>due to lower command overhead.  This may change as SCSI-fast becomes more
>universal, especially in large transfers and as drives get into >4MB/s
>sustained rates.

Given small transfers qualification, "Slight margin in speed" is highly
debatable .... The ability to do read scheduling and write placement
based upon knowing geometry and current head/spindle position allows for
30%-700% better semi-random small file I/O with the right filesystem design.
The SCSI abstraction hides this important performance information. In addition
the command overhead differences mean significant request interleaves for
high transfer rate drives, for current fast SCSI drives often greater than
6-12 sector times. In theory, IDE would be 0 with passthru buffering, and
1-3 with buffer flipping (most current designs require 1 or 2).

While burst transfer rates (the raw drive rate) may be of interest in some
areas, most files/directories are under few K. Drives and filesystems should
be optimized to handle multiple small requests.



>>First, dumb IDE adapters are WD1003-WHA interface compatible, which means:
>>
>>	1) the transfer between memory and controller/drive are done in
>>	software .... IE tight IN16/WRITE16 or READ16/OUT16 loop. The
>>	IN and OUT instructions run a 286-6mhz bus speeds on all ISA
>>	machines 286 to 486 ... time invariant ... about 900us per 16bits.
>>	This will be refered to as Programmed I/O, or PIO.
>
>	I think you mean 900ns, or you have very slow drives... :-)  This gives
>a max burst rate of around 2MB/s.

Slip of the cortex ... and 2MB/s is about right with a highly optimized
driver interrupt routine plus streamlining of the common interrupt handler.
When I did this for SCO's UNIX 3.2v2 two years ago it made a big improvement.
With standard interrupt handlers and driver coding practices, the number
is is closer to half that, and quite noticable on current high performance
IDE drives with buffering.




>	Once per sector?  Don't PC's use the ReadMultiple/WriteMultiple
>commands?  I guess not (which matches what I've heard elsewhere).  Our IDE
>implementations will use read/write multiple to reduce CPU interrupt and
>task-switching overhead on longer reads, and we transfer up to 16 sectors per
>interrupt.  BTW, we have on occasion found bugs in some drives' RWMultiple
>commands, or funny performance results, like slower throughput with larger
>transfers.
>
>	I don't understand your comment about "poor man's disconnect".  While
>you may be waiting for an interrupt, unless you have multiple IDE busses you
>can't use your second IDE drive until the IO on the first is complete.

Yes, Yes ... the interrupt for WD1003/IDE interfaces means the 512 byte sector
buffer is full, and must be emptied. R/W Multiple are used, but it requires
handling a transfer request interrupt for each sector, or busy waiting on
data_request in the command status register ... hence poor man's disconnect
from the processor bus. For WD100? cards, there is a single buffer per
controller ... on the IDE model there is a buffer per drive ... and each
can be active from what I can tell selected by the drive select bit(s).




>	Write-buffering (starting to be commonly available as an option on
>SCSI drives) can help in avoiding slipping revs.  So long as ordering is
>maintained, write-buffering in practice is fine for most uses of desktop
>systems.  Of course, this can apply to either IDE or SCSI, but for IDE you'll
>have to add commands to turn it on, or have it default to on, or use jumpers.

While some speed can be gained by this practice, all ability to handle
error conditions responsibly is generally lost. I am not a big fan of
lookaside bad block handling to a slow microprocessor ... it only
SIGIFICANTLY reduces throughput when important data gets spared ... leading
to difficult to understand IN-FIELD performance problems. Filesystems
should know and deal with media problems ... even though most UNIX's
either don't or have the drivers do the fix ups.



>As a multi-disk interface, or generalized IO interface (tape drives, CDROM,etc)
>SCSI has a large edge.  Also, IDE can only handle 2 devices.  Even if IDE
>tape drives and CDROMs were available (they're not), you'd rapidly start
>needing multiple IDE interfaces. Tapes are (or will be) here, and I
expect CDROMS (now partly proprietary & SCSI) to be mostly IDE & SCSI
in the future. IDE is already extending the WD1003 interface, I expect
addtional drive support will follow at some point, although multiple
hostadapters is a minor cost issue for many systems.



>>All IDE and SCSI drives have a microprocessor which oversees the bus and
>>drive operation. Generally this is a VERY SLOW 8 bit micro ... 8048, Z80,
>>or 8085 core/class CPU. The IDE bus protocol is MUCH simpler than SCSI-2,
>>which allows IDE drives to be more responsive. Some BIG/FAST/EXPENSIVE
>>SCSI drives are starting to use 16 micro's to get the performance up.
>
>	Right.  This is the reason IDE is slightly faster for small, frequent
>IO's.

But I again contend "slightly faster" is wrong ... the gains can be much more
significant as stated above for "small frequent IO's".




>>Even the fastest 486 PC UNIX systems are filesystem CPU bound to between
>>500KB and 2.5MB/sec ... drive subsystems faster than this are largely
>>useless (a waste of money) ... especially most RAID designs. Doing
>>page flipping (not bcopy) to put the stuff into user space can improve
>>things if aligned properly inside well behaved applications.
>
>	This is a combination of poor interfaces and the OS interface.
>AmigaDos tries to do transfers direct to user buffers where possible,
>especially for large reads, and reserve most of it's buffer space for
>small (<1 block) reads and filesystem structures (file headers, etc).  This
>can provide very low-overhead IO for the common cases.  For example, with

What I said was enough ... the UNIX interfaces are more general by DESIGN
and a simplistic OS can surely take additional short cuts beyond page
flipping ... AND SOME UNIX's do, and in some cases more should. Other than
pushing your product, I don't see the utility in knocking UNIX and pumping
AmigaDos. But thanks for your posting anyway.


As a side note, I saw a reference a while back to the IDE standard in
progress ...  How can I get a copy?


John Bass, Sr. Engineer, DMS Design			      (415) 615-6706
UNIX Consultant			 Development, Porting, Performance by Design