*BSD News Article 25469

From: uhclem@nemesis.UUCP
Date: 30 Dec 93 15:33 CST
Newsgroups: comp.os.386bsd.development
Subject: DMA & Refresh - How they work
Message-ID: <-21004666@nemesis>
Path: sserve!newshost.anu.edu.au!munnari.oz.au!sgiblab!swrinde!cs.utexas.edu!news.unt.edu!news.oc.com!utacfd.uta.edu!trsvax!trsvax!nemesis!uhclem
Nf-ID: #N:nemesis:-21004666:000:10619
Nf-From: nemesis.UUCP!uhclem    Dec 30 15:33:00 1993
Lines: 206


<>
I'll try one last time....   and consider reading it all before replying or
				commenting.	 Thanks
				Chip Wizards:  I have simplified a few
					descriptions for clarity.  I know this.

[JM0]	This statement is incorrect.  On the IBM PC/AT/XT arch. The
[JM0]	ram refresh has priority because the DMA chip, the i8237,
[JM0]	by design gives priority (if programmed) to the lowest DMA
[JM0]	channel request, channel 0.  The programming information
[JM0]	for the i8237 is available.  If you don't have a copy of
[JM0]	it I will be happy to send you a photo copy, please let
[JM0]	me know.
 

Fine.  But what you are forgetting is that this priority deals ONLY
with assignment when the DMA controllers DO NOT have the bus.  When
the CPU has the bus, the DMA channel priority will decide which of two
or more currently-pending DMA requests gets the DMA controller first.
Once that arbitration is done, the winning device has the bus AS LONG
AS IT WANTS, regardless of any other DMA request that comes along,
INCLUDING any that have a higher priority number. 

The 8237 does not allow the bus to be taken away from one DMA channel
so that a different one may use the bus, REGARDLESS of the priority the
new request has.  (This isn't a UNIBUS/MASSBUS DMA which arbitrates priority
on each word, IT'S STUPID, OK?) So if channel 5 grabs and hogs the bus,
channel 0, 1, 2 or any other channel can't do anything.

In non-byte-at-a-time modes, the current owner of the bus MUST release the
bus.  Then the DMA controller rescans the request lines and selects the
one that is pending (DRQ asserted) with the highest priority.  Only when no
other DMA requests are pending does the DMA release the bus back to the CPU.

The priorities on a PC/AT compatible system are:
Pri     Ch Width	What
Highest	0  8	ON OLDER designs used to generate pseudo-refresh cycles.
		On newer designs, DMA channel 0 is usually available for
		on-bus controllers.  (DRQ0/ACK0 aren't assigned locations
		on the ISA bus.)  On the newer designs,  refresh is managed
		by the RAM controller chipset, not the DMA.  This is true of
		ALL EISA systems, and many 386/486 ISA designs.

2nd	1  8	IBM suggested putting SNA adapters on this channel.
		Open for general no-hoggy uses.  Some Ethernet works ok here
		since the transfers are of a predetermined size.   The
		trick is to use something that asks for the bus less-often
		than the floppy disk controller.

3rd	2  8	Floppy Disk Controller.  Must be able to get transfers
		in at 15.7 usec intervals (7.85 usec for 2.8 Meg floppies)
		at ALL times.  Channel 1 MUST avoid non-byte-at-a-time
		transfers that exceed this guideline.

4th	3  8	Open

*	4  16	Cascade channel, ties controller 0 (0-3) to controller 1
		(4-7) (although a few megacell designs make DMA 4
		available for on-board use).

5th*	5  16	Open, usually used for SCSI controllers

6th*	6  16	Open

Lowest*	7  16	Open

* If DMA channel 4 is available, it would be 5th in priority and all
remaining channels (5-7) on controller 1 move down one notch.

Simply put, ANY driver/controller that doesn't use DMA byte mode (only one
byte per bus acquisition), can grab the bus, sit on it, and completely
starve RAM refresh in certain designs.  That is why devices that do
multibyte transfers are supposed to release the bus at reasonable intervals
to avoid this, OR transfer data to/from RAM in a contiguous order
throughout the time the bus is held.  Doing this may still screw-up
lower-priority devices, such as the FDC.

Face it, IBM should have put the FDC on channel 1 and used a FDC with
a multi-byte buffer, and this would have eliminated most of the problems.

So putting a device that is going to hog the bus without actually
transferring data on channel 6 (lower priority than channel 2) will still
botch channel 2 operations and could starve refresh.


Most people don't know (or care) that a REAL refresh cycle is simply a
read or write cycle that didn't get finished.  In a refresh cycle,
only one of the address strobes are sent to the RAM whereas in a read or
write, Row and Column are both strobed.  By having the CPU or a DMA device
read or write data sequentially through 256 contiguous addresses, it
has effectively performed a refresh.   It is possible (but hard to code)
to have a cacheless computer that by careful placement of opcodes and data
would be able to function reliably with the refresh circuitry disconnected.
The fetches and read/writes of data to the right locations at the
maximum refresh interval or faster would do the job.  On the PC, simply
filling 256 bytes with NOPs followed with a JMP to the start of the NOPs
would substitute for refresh just fine, as long as no cache was present.
(The sequence is too long for the pipeline, so it doesn't come into play.)

(FYI, all that assumes the RAM subsystem was designed correctly.  With
 Expanded RAM and some other banking systems, the RAM controller may
 neglect to allow DMA cycles that don't decode as an access to a given
 range of RAM to strobe all of RAM like a refresh cycle would.  I haven't
 seen anybody do anything this dumb with extended memory, only expanded.)

So, once the SCSI adapter has the bus, as long as it is writing/reading
sequentially, with no pauses greater than 15usec, the action of the
data transfer will substitute for the refresh that would have occurred,
it hits the address that refresh would have touched next (1 in 256 chance).

What you CAN'T DO is use a slow transfer device, such as a floppy or
device attached to the floppy controller and run it in any DMA mode
other than byte-at-a-time.  Depending on the data rate, a byte may be only
transferred every 16usec, which is at the absolute limit of refresh**
for most RAM designs.  At the same time, if a multisector transfer
is attempted on the floppy controller (from floppy OR TAPE), there will be
a millesecond or two delay between the end of the data from one sector and
the start of data from the next.  If the DMA held the bus throughout this
time, refresh WILL starve.

Yes, byte-at-a-time is a terrible waste of bus resource, taking four
4MHz (or slower) clocks minimum to acquire and release the bus, plus the
actual byte transfer, another 3 clocks or more.  BUT on the other hand, it
doesn't grab the bus until it has something to transfer, so if the sector or
block gets cut short for some reason, the entire system doesn't hang and
starve.

[JM0]: >	my information says that you can skip a few "RAM refresh"
[JM0]: >	cycles... what are you saying?

**
No RAM manufacturer I have references to (Hitachi, TI, NMB, Micron) EVER
say it is ok to skip refresh periods.  You must strobe every line once
every 4,096usec.  There are 256 lines.  If you elect to do these at
regular intervals, then one every 16usec is acceptable.    If you consistently
skip a line that column can fade throughout RAM, depending on how RAM is
designed.  The vendor probably has about 10% of play in the design, but
that is there for manufacturing tolerances, not for us to take advantage of.

Now, most RAM designs allow you to go almost up to the 4,096usec limit
and then strobe all 256 lines back-to-back.  The RAM is now refreshed for
the next 4,096usec and everybody is happy.  Most system builders (and
this goes back to the Z80/8080 days) elect to insert single refresh cycles
at more regular intervals, rather that try to do it all at one point, which
can cause processing performance to look a bit odd if you hit it just right.
(Every now and then, some instruction will appear to take 1024 clocks longer
 than it should.)

The Z80 CPU did a refresh cycle after every opcode fetch, regardless of
whether it was time for one.  You can never do too many.

What you CAN'T do (and I'm sure someone is thinking of this) is stretch
the time span out.  If your refresh is based on inserting a cycle every
16 usec and the bus is tied-up for some reason, it can't catch up.
If the DMA is really generating the address cycles, then it won't miss
any addresses, but it may skip one or more refresh time-slots, pushing all
remaining refresh addresses further down in time, thus exceeding their
4,096usec limit.  There is no "catch-up" mechanism, that quickly inserts
the late refresh address cycles.  Besides, the refreshes would still be late.

Now there are some types of DRAM that does internal refreshing, so a lack of
refreshes from the outside world is unimportant.  This is still pretty
new stuff and appears only in very fast RAM designs.


Finally, there still seems to be confusion about how bus-mastering adapters
work.  One of the host DMA channels (on one of the 8237s) is used to
gain the bus.  When the adapter wants the bus, it asserts DRQ for the
channel like any other device.  When the DMA decides to grant the bus
to that channel, normally it pulls -ACK low and begins generating
IO Read/RAM Write or IO Write/Address Write cycles.  This is known as
"flyby DMA", because a Read and Write cycle effectively occur in a single
machine cycle.  The adapter regulates the speed of the xfer by
requesting wait states be inserted by the DMA on a byte-by-byte basis.

What happens with an adapter that wants to be bus master is the DMA
sees the DRQ, acquires the bus from the CPU, pulls -ACK low for
that channel, and the the DMA goes tri-state (no input or output) on all
the data/address and control lines with the exception of the -ACK  signal
for that channel.

This tells the adapter that it is "on the air" and as long as it holds
the DRQ line high, it has the bus.  The 8237 is effectively "out of the loop"
and generates no data cycles of any type.  The host-master adapter may
perform any number of transfers in any direction, at any speed and any
length because the 8237 is not involved, nor does it monitor the cycles in
any way.  All the 8237 does is wait for DRQ to drop, which releases the
bus back to the 8237 and the 8237 then retests the DRQ lines for all
channels looking for anything else that needs service.  If there are none,
the CPU gets the bus back.


Quite frankly, I don't know why anyone is still going on about all this.
Read the manuals and the schematics, maybe even write AND distribute a
production-level driver or two, then we can all talk about system
architecture with some authority.   :-)


Frank Durda IV <uhclem@nemesis.lonestar.org>|"How do I know?  First SCSI driver
...utacfd!nemesis!uhclem (nearest internet) | for *NIX (1984), also serial, 
...letni!rwsys!nemesis!uhclem	            | video (computer and NTSC), tape,
...decvax!microsoft!trsvax!nemesis!uhclem   | CD-ROM, audio and a few other
					      drivers are under my belt."