*BSD News Article 22910

Newsgroups: comp.os.386bsd.bugs
Path: sserve!newshost.anu.edu.au!munnari.oz.au!news.Hawaii.Edu!ames!agate!howland.reston.ans.net!pipex!uunet!world!hd
From: hd@world.std.com (HD Associates)
Subject: Re: SCSI disk I/O error
Message-ID: <CFK6Fu.Grw@world.std.com>
Organization: The World Public Access UNIX, Brookline, MA
References: <1993Oct23.203652.4718@diana.ocunix.on.ca>
Date: Wed, 27 Oct 1993 13:50:17 GMT
Lines: 52

In article <1993Oct23.203652.4718@diana.ocunix.on.ca>,
Dyane Bruce <db@diana.ocunix.on.ca> wrote:
>I am having a problem with NetBSD 0.9 on a ISA 486 DX/66
>Adaptec 1542C Controller, internal Seagate ST-2383N, external
>Panasonic LF-7010 (optical R/W) and NEC Multispin CDR-74-1 (CDROM).
>
>I sometimes get a "sd0:reset" console error with subsequent consistent
>"I/O error" on any command from the shell. Extracting gsrc
>triggers this everytime. This then forces the use of the
>"big red switch" on the machine. Before I dig into the SCSI driver has
>anyone seen this as well? Or yet better fixed this? I have noted problems
>intermittent soft errors under NeXTSTEP and this same machine
>(I am dumping NeXTSTEP 3.1 for NetBSD 0.9) which NeXTSTEP was
>able to recover from. I have been completely unable to determine
>where these errors are coming from. (Yes, I have checked terminations
>put brand new cables in, the works. The point is NeXTSTEP was able
>to deal with these errors and NetBSD 0.9 doesn't.)

When you get UNIT ATTENTION ("removable medium may have been changed or
the target has been reset") from the disk the sd driver sets a "not
valid" flag and disallows further I/O to the disk until it is fully
closed and reopened.  Thus the big red switch.

I've had problems with multiple initiators on a SCSI bus because some
of the initiators always insist on resetting the bus.  The SD driver
then does what you are seeing.

I think the sd driver could be extended to look at the additional sense
code.  ASC=0x28 is "Not ready to ready transition, medium may have
changed" and ASC=0x29 is "Power on, reset, or bus device reset
occurred".  We could ignore ASC=0x29 and treat ASC=0x28 the same way as
we are now, that is, no more I/O to an open device that someone may
have changed.

Both Sun and SGI are more tolerant of SCSI bus resets.

Two points:

1. I just looked through the source and don't see the "sd0:reset"
message anywhere in any of the revs I have.  Netbsd is packed up right
now, though, so it could be changed to say that in there.  You want to
look around for the SDVALID flag.

2. Is your disk really giving back a UNIT ATTENTION?  If so, why?  It
would be interesting to dump the full sense information when you get
that condition and see what your drive is telling you.

Peter
-- 
Peter Dufault               Real Time Machine Control and Simulation
HD Associates               Voice: 508 433 6936
hd@world.std.com            Fax:   508 433 5267