*BSD News Article 72466


Return to BSD News archive

#! rnews 4198 bsd
Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!news.bhp.com.au!mel.dit.csiro.au!munnari.OZ.AU!news.ecn.uoknor.edu!news.eng.convex.com!newshost.convex.com!cs.utexas.edu!math.ohio-state.edu!howland.reston.ans.net!vixen.cso.uiuc.edu!usenet
From: Vlad <roubtsov@uiuc.edu>
Newsgroups: comp.unix.bsd.freebsd.misc
Subject: URGENT help request: hard disk errors
Date: Sun, 30 Jun 1996 18:30:47 -0500
Organization: Baphomet's Throne
Lines: 74
Message-ID: <31D70E03.41C67EA6@uiuc.edu>
NNTP-Posting-Host: mossberg-95.slip.uiuc.edu
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Mailer: Mozilla 3.0b4 (X11; I; FreeBSD 2.1.0-RELEASE i386)

Hi:
	I will appreciate any speedy responses regarding my plight detailed
below:

	In short: since yesterday I've been having kernel messages of this
kind:
Jun 30 02:01:17 throne /kernel: wd0s2f: hard error reading fsbn 740797
of 740784-740799 (wd0s2 bn 956029; cn 948 tn 7 sn 4)wd0: status
59<seekdone,drq,err> error 40<uncorr> 

Some of this looks cryptic, but it seems clear that there's a problem
reading cylinder 948, track 7, sector 4. Initially the problem
demonstrated itself during cron's running of /etc/daily at 2am, which
does a system-wide find of setuid files: I've got two error messages on
my console and I didn't like it one bit. Since then I've been doing
something like ls -R / and every now and then get "hard" of "soft"
errors (some mention ECC correction code, once or twice I even had an
"interrupt timeout" error). I have never had anything like this before.

	This is a Pentium-75 system with Tyan Triton III m/b, a 1.2G Western
Digital drive, Cirrus svga video card, and 16M of 70ns ram. I use
2.1-RELEASE. The kernel has been recompiled to remove hardware that I
don't have (automatic end-of-interrupt for the primary interrupt
controller is enabled as well, if that's relevant). To be able to
partition the disk for DOS and FreeBSD I had to setup BIOS to use CHS
mode for int 13h and right now DOS takes up 400M in the beginning of the
drive with the rest given to FreeBSD. I know that problems with higher
disk modes (3 and 4) had been reported (including triton chipset+WD
drives, I think), so I can tell you that the mode is set to "auto" in
BIOS setup and mode 4 is selected -- but this is irrelevant to FreeBSD,
which doesn't use BIOS after boot-up.
	I have tried to look at these same sectors/cylinders from DOS, using
Norton DE (of course, since "Large"-mode translation has been switched
to "CHS" in BIOS setup I couldn't access cylinders beyond 1024 from DOS,
but I did look at the ones with FreeBSD's errors and closer to the
beginning of my disk) -- well, they read fine, no errors. I admit this
baffles me. Surely if there's an ECC bad in a sector then DOS' 13h
interrupt will complain?...
	Before I decide that I need to replace my disk, maybe there are more
checks that I could do?
	
(1) Booting into single-user mode and doing fsck doesn't show any
problems with the file system. Any other utils I can run ?

(2) Is there a UNIX/FreeBSD utility/program that will do a direct read
of a given block number to verify if it's bad ?

(3) Any UNIX/FreeBSD utils that can check physical state of my drive?

(4) Since kernel buffers disk access, I find it difficult to pinpoint
which files may have been affected -- any suggestions?

(5) Can two IRQ's from my WD controller and video card/smth else
interfere somehow? If yes, why since yesterday?

(6) Does FreeBSD handle bad blocks and if yes, how? Are EIDE controllers
smart enough to do automatic bad sector substitution without OS'
knowledge?

(7) I can try to dump most of my FreeBSD stuff to cylinders freed from
under DOS before re-newfs'ing it -- I tried making a new fs in place of
my DOS partition but am regrettably inexperienced with disklabel etc.
When I do disklabel [-e] wd0s1 it says
disklabel: ioctl DIOCGDINFO: Invalid argument
and quits. Basically, how can I add another partition to my existing
ones ? I need a crash course on disklabeling etc and manual isn't too
explicit.

(8) Is there any hope or "Darkness and Decay and the Red Death held
illimitable dominion over all"?...

Vlad

До