*BSD News Article 19374

Xref: sserve comp.os.os2.programmer:13285 comp.os.linux:52506 comp.os.mach:3153 comp.os.minix:22539 comp.periphs:4083 comp.unix.bsd:12385 comp.unix.pc-clone.32bit:4021 comp.os.386bsd.development:1025
Newsgroups: comp.os.os2.programmer,comp.os.linux,comp.os.mach,comp.os.minix,comp.periphs,comp.unix.bsd,comp.unix.pc-clone.32bit,comp.os.386bsd.development
Path: sserve!newshost.anu.edu.au!munnari.oz.au!news.Hawaii.Edu!ames!haven.umd.edu!darwin.sura.net!news-feed-2.peachnet.edu!news-feed-1.peachnet.edu!umn.edu!csus.edu!netcom.com!jmonroy
From: jmonroy@netcom.com (Jesus Monroy Jr)
Subject: More on the DMA timing problem
Message-ID: <jmonroyCBKrE9.76n@netcom.com>
Keywords: DMA FDC timing problem QIC-40/80 
Organization: NETCOM On-line Communication Services (408 241-9760 guest)
Date: Wed, 11 Aug 1993 03:08:33 GMT
Lines: 200


 
 
                More on that #$% DMA problem
                ----------------------------
 
        After some length discussion with Jim Segrave of
        Segrave Software Services, he has tendered some helpful
        suggestions.  I am posting this for those of you
        that may be experiencing the same problem as I.
 
        What follows is a condensed version of the last conversation
        with Jim.
 
 
 
=======================================================================
mail jes@grendel.demon.co.uk
Re: Subject: Slow DMA devices
Date: Tue, 10 Aug 93 1:48:36 GMT
 
>> According to Jesus Monroy Jr:
>> > >> Under what other circumstances would a DMA overrun occur?
>> > >>
>> >         The conditions are recallable, but not consistent.
>> >         This is what I do:
>> >
>>
>>  [[deleted stuff]]
>>
>> >         6) The pattern on a dma failure is 2 to 4 reported
>> >            failures in a row. The first failure reports
>> >            0xb4 for the DMA status register, as stated before,
>> >            then the remaining status readings are all 0xb0.
>>
>> I was thinking about this today and suddenly realised something. You
>> are getting DMA overrun from the floppy controller - ie it thinks that
>> a requested transfer did not take place.
>>
        Possible, either it did not start, or it did not complete,
        or one of the bytes did not ACK in time.
 
 
>> The DMA controller is showing
>> TC reached - it thinks all the transfers did take place.
>>
        It thinks, but the TC high does not mean that a Terminal Count
        was completed.  The data guide states that if an error occurs,
        I.E, it got hit with another request, then it will complete
        prematurely with a TC.  This may be being done to keep the
        requestor from hanging(waiting for a TC signal, like n Demand
        mode.)
 
>> The problem
>> may not be a timing issue, it may be that for some reason either the
>> floppy is making more DMA requests than you have programmed the transfer
>> count for, or you are getting bogus DMA requests, causing the DMA controller
>> to terminate early.
>>
        I beleive what you say is possible, but there is no evidence
        to these facts.
 
>> If it were a timing issue where the DMA controller
>> failed to ACK in time - the point I was arguing against - then I would
>> not expect to see TC reached in the DMA controller - eg a status of
>> 0xb0, not 0xb4.
>>
        Negative, I can quote the data guide if you like.
        The high nibble indicates request can be in request.
 
>>  [[deleted stuff]]
>>
>> > >> One of the first things any sensible test or diagnostic program does
>> > >> is set the retry limit to one, since when commisioning hardware you
>> > >> do not want to depend on retries to cover a marginal condition. I have
>> > >> never encountered any problems with DMA overrun on PC clone boards
>> > >> during extensive testing - 10e6+ reads from a single floppy for
>> > >> example.
>> > >>
>> >         A sensible test program will know of this problem
>> >         with the DMA overrun. I will remind you it is the
>> >         FD controller that is reporting the overrun.
>>
>> But in my case, since I am commissioning new hardware designs, I wrote the
>> test programs. I can assure you I don't put in any retries - these are
>> bottom level drivers direct to the floppy and DMA controller, not calls
>> to any higher level routines. The first boards I did this on were
>> 8 Mhz PC compatibles for POS terminals. The floppy code was developed
>> and tested on Amstrad PCs - 8Mhz 8086s and clone PCs with 4.77MHz 8088s.
>> Since then I have done similar work on 286 and 486Sx AT compatible hard-
>> ware for petrol station retailers.
>>
        OK, you seem to have an established record.
        The difference we are seeing may be that the overseas
        Motherboards we (us on the west coast) are seeing may
        be that the Mfgr. has wired around the disable line somehow
        and the consequences are what we(us) see, namely the DMA
        overruns.
 
>> > >> >         IBM BIOS does not correct the error, it only reports it, and
>> > >> >         to the data segment.  It is up to MS-DOS to correct the
>> > >> >         problem, which it does not.  It (MS-DOS) retries 3 times,
>> > >> >         giving us a total of 15 retries.  If you have any doubt as to
>> > >> >         this I will send you a program to confirm this on a MS-DOS
>> > >> >         machine.
>> > >> The retry count is correct and I am aware of it. However - any quality
>> > >> disc diagnostic program will do what I described - test with no retries.
>> > >>
>> >         I disaggree.
>>
>> I wrote it, it didn't do retries, it did log every error.
>>
        I believe you, now.
 
>> > >> I would think its relevant in re your suggestion that RAM refresh is
>> > >> causing the overruns- if even a sluggish processor like an 8088 can
>> > >> refresh RAM and service floppy DMA without errors - which it can, I've
>> > >> tested it - then an 80[34]86 running at 16+Mhz can do so as well.
>> > >>
>> >         You are still assuming that this problem does not exists
>> >         on MS-DOS machines. My assumption is they (MS-DOS) ae just
>> >         doing retires.
>>
>> I have a lot of hands-on hardware experience which says it doesn't. I
>> have also developed network interfaces using the DMA controller which
>> I have run on DMA channel 1. If there were a problem with refresh locking
>> the FDC DMA channel out, then adding a network card with a higher DMA
>> priority than the FDC would aggravate the problem. I was concerned that
>> the network I/F might cause just the sort of problems you were suggesting,
>> so I drove the network card with continuous maximal packets on a 12 MHz
>> 286 board, while doing continuous floppy reads. No errors occurred - and
>> retries were not being performed. It really is a non-issue.
>>
        I beleive you, but we are seeing differences in DMA problems.
 
>>  [[deleted stuff]]
>>
>> There's no point in beating this to death. At any rate, as I pointed out
>> above, the symptoms actually appear to be too many DMA transfers, not
>> missed ones.
>>
        This is a possiblity. So, in this case I will start a new
        round of beta testing.  Do you have any suggestions on how
        best to proceed?
 
 
>> >         ditto.
>> >
>> >         My main concern with your comments is that you have said
>> >         nothing that, to date, makes me beleive you.
>> >         Understand I am not calling you a liar, to the contray,
>> >         I beleive that you beleive you are correct.
>> >
>> >         I checked my premise with three different engineers,
>> >         they all agreed that the scenerio I described to be correct.
>>
>> The scenario is possible under some conditions and eminently plausible, but
>> the timings involved on PC hardware at low transfer rates mean that it
>> will not happen on these machines unless:
>> 1) some DMA device is seizing the bus for extended burst mode transfers
>> or
>>
        The RAM refresh is a burst mode operation.
 
>> 2) someone is executing a lengthy instruction with the lock prefix set
>>    (off the top of my head, I can't remember if you can lock an entire
>>     rep movs or similar block transfer/block IO instruction). At any
>>     rate, I would be surprised if anyone ever did this in a single
>>     processor environment)
>>
        THIS MAYBE IT, the bus LOCK happens automatically
        on all string copies as of 80286.  However, I did take
        control of the machine with an "splxxx()", which may not
        have been enough.
 
>> or
>> 3) some instruction is pulling the processor RDY line false to add
>>    wait states. No reasonable hardware (even a genorous description of
>>    the PC as reasonable hardware) is going to pull 8 or more usec of
>>    wait states.
>>
        This may be a special case for cheap boards.
 
>> >         More evidence in my favor is when release 0.0 came out
>> >         a large percentage of the people expericed problems with
>> >         DMA overruns.  All reported changing disk, drive, cpus,
>> >         motherboards .. etc...
>>
>> I'm running Linux, so I have no real experience with the BSD release.
>> Nonetheless, I'd have to work a bit to cause DMA overruns on a PC unless
>> I also was running a DMA hard disc or networking card and I decided to
>> do some sort of burst transfers.
>>
        Another good point.
 
___________________________________________________________________________
Jesus Monroy Jr                                          jmonroy@netcom.com
/386BSD/device-drivers /fd /qic /clock /documentation
___________________________________________________________________________