*BSD News Article 13436


Return to BSD News archive

Path: sserve!newshost.anu.edu.au!munnari.oz.au!news.Hawaii.Edu!ames!agate!howland.reston.ans.net!gatech!rutgers!uwvax!oka.cs.wisc.edu!jcargill
From: jcargill@oka.cs.wisc.edu (Jon Cargille)
Newsgroups: comp.os.386bsd.bugs
Subject: Re: PROBLEMS WITH PATCHKIT 0.2.2 - Advice/help needed :-(
Message-ID: <1993Mar26.201921.28420@cs.wisc.edu>
Date: 26 Mar 93 20:19:21 GMT
References: <C4BLJ8.GA5@ns1.nodak.edu>
Sender: news@cs.wisc.edu (The News)
Organization: Univ. of Wisconsin CS Dept
Lines: 85

In article <C4BLJ8.GA5@ns1.nodak.edu> tjon@plains.NoDak.edu (Christopher C. Tjon) writes:

>I installed patchkit 0.2.2 this weekend and had the following
>experiences I would like to say first that this situation is an
>improvement over the past.
>
>first: I have a AMD 386dx/40 system with 16mb RAM, adaptec 1542B with
>maxtor LXT340s and Wangtek 5150es.  I have a et4000 super vga card
>with 1mb and an IIt 387 coprocessor.  As I was having many stability
>problems prior to this patchkit I decided to go back to a vanilla
>system and start over.  I went all the way back to low level
>formatting my drive, installed bin01 and src01.  after applying all
>the patches I ran afterinstall.sh(or whatever it was called) and all
>seemed well.  I booted the new kernel etc an began to run the
>buildworld.sh script.  here is where the trouble starts.

	[...description of various kernel traps deleted--jmc...]
>
>Ocassionally pieces of my filesystem will disappear.  I will cd to
>somedirectory only to find it saying ". not found".  I can do a cd /
>and get out but the directory is gone.  If I reboot it mysteriously
>returns as if it had always been there.  the same thing occurs with
>random individual files.  I will look at them and find them filled
>with garbage and when I reboot they are fine.  This is TOO weird :-(
>:-(.  

This sounds like a slight variant of a hardware problem.  This
occurs on motherboards which do not invalidate the cache correctly.

This problem goes something like this.  A bus-mastering SCSI
controller, such as your 1542B writes a page in memory (suppose it's a
disk buffer) via DMA.  The memory which was DMA'ed to is in the SRAM
cache, but the motherboard does not invalidate the cache.  Then some
program reads from that disk buffer.  However, instead of reading the
fresh-off-disk data, the program is given the garbage in the cache.

Unfortunately, since this is a hardware problem, none of your options
are very attractive.  You can:
	(1) put up with rebooting, and don't ever count on having a
	     reliable system
	(2) buy a new motherboard
	(3) buy a less-smart SCSI controller.  Are any non-DMA SCSI
            controllers supported by Julians drivers yet?

>one would think that if they were corrupted they would stay
>that way.  Bear in mind that fsck reports no errors sometimes.  

That's because the problem was not on disk at all, but rather in the
memory buffers.

>Overall i think my system is more stable now than it has ever been.
>At least you can read the error message rather than missing it before
>the system reboots as before.

I'm amazed that your system would be as stable as you say it is given
the severity of the motherboard problem...  Having 16M of memory may
reduce the frequency of the problems, since a smaller percentage of
your memory is in the cache at any time, reducing the probability that
the DMA-ed to data is cached.

>Do I just need to suffer through the rest of the compile before 
>things will stablize or what?   

I don't think it will improve when everything is compiled.

>Keep in mind that I was having mysterious reboots and filesystem
>problems prior to 0.2.2.  I find it difficult to believe that there
>is anything wrong with my equipment.  It is all brand new and check
>out on every diagnostic I can find to run on it.  Other os's such as
>linux and sco 3.2.4 have had no troubles on it at all.

Now *THAT* I find puzzling.  Shouldn't Linux and SCO also have
problems with a motherboard that doesn't invalidate the cache
correctly?

Sorry not to have happier news.   :-(

#include <std-disclaimer.h>

Jon
-- 
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
Jon Cargille		jcargill@cs.wisc.edu
Want your .sig compressed?  Reasonable rates
and fast turnaround. Call today!