*BSD News Article 4622


Return to BSD News archive

Newsgroups: comp.unix.bsd
Path: sserve!manuel!munnari.oz.au!uunet!mcsun!sunic!psinntp!psinntp!dg-rtp!ponds!rivers
From: rivers@ponds.uucp (Thomas David Rivers)
Subject: Some more on NMI problems (some meager advancement)
Message-ID: <1992Sep7.014351.946@ponds.uucp>
Date: Mon, 7 Sep 1992 01:43:51 GMT
Lines: 63


Well, I thought I would relay my current status with the NMI
investigation.

  Right now, I'm thinking it has something to do with an IDE
 controller/disk drive, so I have been examining the wd.c driver
 trying to divine what it might be; without too much luck. (I 
 know very little about the IDE/WD disk controllers.)

  The common thread seems to be:
    1) It happens during some prolonged disk I/O
         (i.e. rebuilding the kernel over-and-over, or building X)
    2) It happens with IDE drives, other people have run my test
        (rebuilding the kernel) with a SCSI drive, 486-33 and 16meg
        without finding any NMIs.

 I have tried several switches on my controller; 

   1) Having the disk drive/controller assert IOCHRDY (by default it
      doesn't.)
   2) Changing the "precompensation" (which I don't believe is
      related to disk precompensation) from 125ns to 187ns.
   3) Changing the speed of the processor from 20mhz to 8mhz.

  None of these changes seems to affect the problem.

  Several people have suggested it could be a cache problem; but I'm
 running on a very old 20mhz 386, it doesn't have any caches.

  I'm still reluctant to believe it's actually a memory problem, since
 
   1) It doesn't occur with version 0.0
   2) It only occurs *once*, once I get one NMI, it never happens
        again.  You wouldn't think the memory could repair itself...
   3) It happens within 2 hours of running the kernel compiles, often
       within two minutes.  38+hours of memory tests (reading and
       writting double/single words randomly) found nothing.


 One last item; I did discover where the empty /var/log/messages line
 was produced, and why you only got the empty line on the console,
 without the NMI messages.

 In isa.c, the function to handle the Non-Maskable Interrupt (isa_nmi)
 calls log(), but the string contains an initial new-line.  Removing
 that new-line fixes those problem, at least.

 Again, suggestions are always welcome - I would especially appreciate
 it if someone with an IDE setup tries to compile the kernel over-and-over
 (i.e. in a shell "for"-loop) to see if the problem can be reproduced
 by more people.

 My next approach is to replace the wd.c driver with Tom Ivar Helbekkmo's
 new driver - to see if he has altered things enough to either cause
 the problem to go away, or make it's occurrence more reliable.  Unfortunately,
 I don't seem to be able to get to barsoom.nhh.no right now...
 (trans-atlantic links are difficult at best.)

   - Still trying!! -

  - Dave Rivers -
   (rivers@ponds.uucp)