*BSD News Article 69078


Return to BSD News archive

Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!news.mira.net.au!news.vbc.net!samba.rahul.net!rahul.net!a2i!news.PBI.net!decwrl!elroy.jpl.nasa.gov!swrinde!newsfeed.internetmci.com!newsxfer2.itd.umich.edu!agate!dan
From: dan@math.berkeley.edu (Dan Strick)
Newsgroups: comp.unix.bsd.freebsd.misc
Subject: Re: Data corruption on an ASUS P/I-P55TP4N motherboard (summary)
Date: 21 May 1996 20:30:03 GMT
Organization: University of California, Berkeley
Lines: 43
Message-ID: <4nt94b$8me@agate.berkeley.edu>
References: <4mvvst$5jf@agate.berkeley.edu> <4n3s10$b2h@news.csie.nctu.edu.tw> <4ncg6e$ol1@vidar.diku.dk>
NNTP-Posting-Host: math.berkeley.edu

A few weeks ago I submitted the following article:

>>: Immediately after installing FreeBSD on a modern pentium PC, I became
>>: afflicted with core dumping programs and apparent file system damage.
>>: Then I discovered that repeated "fsck -n" produced variable results,
>>: sometimes showing no file system damage at all.
>>:
>>: I am currently experimenting with turning off the motherboard cache
>>: to see if the problem goes away, but results are so far inconclusive
>>: since the problem is intermittent.
>>:
>>: Is anyone familiar with the ASUS P/I-P55TP4N motherboard and its
>>: foibles?  The manual says, "This motherboard features Intel's
>>: 430FX PCI chipsets with I/O subsystems."  Are these chipsets known
>>: to have problems?  What are my options?

I received several email responses and there were several followups.
Nobody knew of any general problems with the ASUS P/I-P55TP4N motherboard.

I have since spent a lot of time screwing around with this motherboard,
attempting to produce unambiguous symptoms.  I was eventually able to
reproduce the problem by running repeated "fsck -n" on /usr while running
a quickly hacked memory pattern test program.

It turned out that even though the data corruption seemed to occur only
under FreeBSD (and not under W95) and only when doing EIDE disk I/O,
the corruption was in fact caused by an unreliable bit in the motherboard
external cache expansion module.  I replaced the cache module this
morning and the system seems to be working correctly.

I estimate that I spent over week of my time solving this problem.
It would have been a lot less if I had proper testing equipment and
spare parts for experimental parts swapping, but it would have still
consumed a lot of my time.

There is an important moral to this story.  Tell your motherboard vendors
that you will PAY EXTRA for error checking in main memory, external cache,
and I/O busses.  Tell them that you are willing to BUY FROM SOMEONE ELSE
to get these features.  Let them know that you care.  Only then will they
begin to produce reliable products.

Thanks,
	Dan Strick