*BSD News Article 4904

Newsgroups: comp.unix.bsd
Path: sserve!manuel!munnari.oz.au!spool.mu.edu!caen!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry
From: terry@cs.weber.edu (A Wizard of Earth C)
Subject: Re: Fixed: Runs at 8MHz, Crashes at 33MHz, 386bsd
Message-ID: <1992Sep11.222258.2144@fcom.cc.utah.edu>
Sender: news@fcom.cc.utah.edu
Organization: Weber State University  (Ogden, UT)
References: <1992Sep8.070731.21159@bernina.ethz.ch> <1992Sep11.200736.20247@qualcomm.com>
Date: Fri, 11 Sep 92 22:22:58 GMT
Lines: 93

In article <1992Sep11.200736.20247@qualcomm.com> karn@servo.qualcomm.com (Phil Karn) writes:
>In article <1992Sep8.070731.21159@bernina.ethz.ch> torda@igc.ethz.ch (Andrew Torda) writes:
>>
>>     At 8 MHz, my machine appears perfectly stable.
>>     At 33 MHz, I get repeated trap type 12 panics.
>[...]
>>The most concrete suggestions were to either add wait states or buy
>>faster memory. Couldn't add any more wait states, but I managed to
>>swap 8Mb of 80ns simms for 70 ns simms.
>>
>>Instantly, I could rebuild kernels or run my little crash program
>>which simply allocated ever increasing amounts of memory and scribbled
>>through it.
>>The peculiarity is that with the old memory, I had been able to run
>>dos, windows in enhanced mode and even SCO unix.
>>It would still be nice to know what the cause is and why 386bsd
>>provokes the problem.
>
>Very interesting. I've been having similar problems with my 486-50
>(with 16 meg, Adaptec SCSI controller and NE-2000). A good way to
>crash it is to go into one of the source trees and run make. Often I
>couldn't get through half a dozen nroff's of man pages before a panic,
>usually a message from vm_fault() that I interpret to be the kernel
>dereferencing a bogus pointer. Sometimes it wouldn't even get through
>the reboot before it would panic again. Applying every patch in sight
>didn't seem to help the problem.
>
>So, inspired by your note, I just tried hitting my machine's Turbo
>switch, knocking its clock speed down to 10 Mhz (at least that's what
>the display on the front panel says). And the machine now seems *much*
>more stable. It's gotten through several source directories without
>incident so far, albeit much more slowly.
>
>One possible theory (stress *theory*): many modern PC chipsets provide
>registers to control things like bus clock speeds, memory wait states,
>etc. Much more convenient than the hardware jumpers on old motherboards.
>Since these are usually set by the BIOS setup program and forgotten,
>perhaps something in 386BSD is scribbling over them (or their CMOS
>save areas) unintentionally? Going to faster memory, or slowing the
>machine down, would let the machine run with these unintentionally
>changed settings. This theory would also explain why the same machine
>could run other systems at full speed without problem, because they
>leave the control registers alone.
>Comments?

One.  Bus mastering controllers using DMA.

	Most of these controllers have clocks you can set to tell it how
long it *MUST* relinquish the bus for and how frequently you have to do
it.  I ran in this problem while writing a Am33C93A SCSI interface driver
for a WD7000-FASST2.  The system would crash occasionally.

>From the Western Digital documentation [with comments]:

	"The maximum on time [where the controller owns the bus] should
	 be 15uS less all overhead time required to allow the host to
	 service memory refresh cycles, including DMA bus arbitration
	 time."

My theory is that the aha1542b isn't letting the memory refresh.  When
you start actually using a lot of memory (say during a compile), you get
up into the region where it isn't refreshing (since the refresh proceeds
up to the the point that the bus is grabbed away, the lower the memory,
the "safer" it is).

*This* is why the "memory problem" can't be identified with a memory
test program (other than 386bsd, of course ;-)).  SCO has pessimistic
assumptions about the speed of the machine, or actually tests to see
how much time it can grab, and so doesn't have a problem.

Test:  Anyone have a non-SCSI system that has the "works OK at 8MHz but
not at 33MHz" problem?  I realize this isn't a definitive test, as I
might get responses from someone running 200ns RAM saying "Yeah; funny...
no one else seems to have the problem", but it should give a weight of
SCSI-with-problem vs. not-SCSI-with-problem.

Not to discount the "low core being overwritten" theory, but if you
were getting the problem after warm boot *only*, then I could see it;
otherwise, it's unlikely that low core would be getting blown on one
machine and not another.


					Terry Lambert
					terry_lambert@gateway.novell.com
					terry@icarus.weber.edu
---
Any opinions in this posting are my own and not those of my present
or previous employers.
-- 
-------------------------------------------------------------------------------
                                        "I have an 8 user poetic license" - me
 Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial
-------------------------------------------------------------------------------