*BSD News Article 5532

Newsgroups: comp.unix.bsd
Path: sserve!manuel!munnari.oz.au!spool.mu.edu!agate!dog.ee.lbl.gov!news!nosc!crash!fpm
From: fpm@crash.cts.com (Frank Maclachlan)
Subject: 386BSD 0.1 wd.c patch is WRONG!
Organization: CTS Network Services (crash, ctsnet), El Cajon, CA
Date: 23 Sep 92 15:48:44 PDT
Message-ID: <1992Sep23.154844.13509@crash>
Summary: Previously posted wd.c hard disk driver patch causes problems
Keywords: 386BSD,patch
Lines: 118

Greetings:

My face is red!  I posted a putative 'fix' to '/sys/i386/isa/wd.c'
(Message-ID: <1992Aug19.163236.16943@crash>).  This was subsequently
included in Terry Lambert's patchkit-0.1 as patch00004.  Well, the
patch is wrong-headed!  The patch as posted was:

>From crash!fpm Wed Aug 19 16:46:34 PDT 1992
>Article: 4113 of comp.unix.bsd
>Newsgroups: comp.unix.bsd
>Path: crash!fpm
>From: fpm@crash.cts.com (Frank Maclachlan)
>Subject: wd.c problem (w/ Patch)
>Organization: CTS Network Services (crash, ctsnet), El Cajon, CA
>Date: 19 Aug 92 16:32:35 PDT
>Summary: bad144 bug fixed
>Keywords: bad144, ESDI
>
>Greetings:
>
>I think there is a problem in 'usr/src/sys.386bsd/i386/isa/wd.c'
>when the first sector in a multi-sector read or write request is
>present in the bad144 table.  The bad144 table search code at
>line 382 finds the sector in the bad144 table and replaces the
>block number, cylinder, head, and sector addresses with values
>corresponding to the replacement sector.  At line 434, the sector
>count register is loaded with the number of sectors in the entire
>transfer.  This is wrong; it *MUST* be set to *one* sector.  A
>read would return the wrong data in sectors after the first; a
>write would *overwrite* other replacement sectors or even the
>bad144 table on the last track.
>
>You could fix this by changing lines 382 and 383 from
>
>	if ((du->dk_flags & (/*DKFL_SINGLE|*/DKFL_BADSECT))
>		== (/*DKFL_SINGLE|*/DKFL_BADSECT))
>
>to
>	if ((du->dk_flags & (DKFL_SINGLE|DKFL_BADSECT))
>		== (DKFL_SINGLE|DKFL_BADSECT))
>
>This prevents the defective sector from being replaced the first
>time through (DKFL_SINGLE is not set).  Assuming the sector is
>marked bad, the read or write of the defective sector will fail,
>and the interrupt routine (wdintr) will retry the operation with
>the DKFL_SINGLE bit set.  This causes the sector to be replaced
>and the transfer count is properly set to 1.
>
>Another solution is to leave the above two lines unchanged and
>apply the following patch.  This has the advantage of immediately
>replacing a defective sector at the start of a transfer without
>requiring that the operation first fail.
>
>---- snip snip ----
>*** wd.c.ORIG	Wed Aug 19 15:43:21 1992
>--- wd.c	Wed Aug 19 16:16:18 1992
>***************
>*** 402,407 ****
>--- 402,408 ----
>  			cylin = blknum / secpercyl;
>  			head = (blknum % secpercyl) / secpertrk;
>  			sector = blknum % secpertrk;
>+ 			du->dk_flags |= DKFL_SINGLE;
>  #ifdef	WDDEBUG
>  			    printf("new = %d\n", blknum);
>  #endif
>---- snip snip ----
>
>Note that the bad144 code *doesn't* work unless *all* sectors
>present in the bad144 table are marked bad (any attempt to read
>or write any of them will return an immediate error - usually a
>BAD BLOCK error code).
>
>	[irrelevant stuff deleted]
>--
>UUCP: {hplabs!hp-sdd ucsd nosc}!crash!fpm
>ARPA: crash!fpm@nosc.mil
>INET: fpm@crash.cts.com

The patch creates a new problem.  When processing a multi-sector
read or write request involving a bad block which is not the first
block in the request, setting the DKFL_SINGLE flag causes the hard
disk controller task file (registers) to be loaded and a new read
or write command to be issued while the original read or write
operation is still in progress.  Some controllers do not react
rationally to this sort of abuse (mine included).

The first method given in the above posting eliminates this
problem.  Here it is in patch format:

*** wd.c.ORIG	Tue Jul 14 17:55:21 1992
--- wd.c	Wed Sep 23 16:16:14 1992
***************
*** 379,386 ****
  	 * See if the current block is in the bad block list.
  	 * (If we have one, and not formatting.)
  	 */
! 	if ((du->dk_flags & (/*DKFL_SINGLE|*/DKFL_BADSECT))
! 		== (/*DKFL_SINGLE|*/DKFL_BADSECT))
  	    for (bt_ptr = du->dk_bad.bt_bad; bt_ptr->bt_cyl != -1; bt_ptr++) {
  		if (bt_ptr->bt_cyl > cylin)
  			/* Sorted list, and we passed our cylinder. quit. */
--- 379,386 ----
  	 * See if the current block is in the bad block list.
  	 * (If we have one, and not formatting.)
  	 */
! 	if ((du->dk_flags & (DKFL_SINGLE|DKFL_BADSECT))
! 		== (DKFL_SINGLE|DKFL_BADSECT))
  	    for (bt_ptr = du->dk_bad.bt_bad; bt_ptr->bt_cyl != -1; bt_ptr++) {
  		if (bt_ptr->bt_cyl > cylin)
  			/* Sorted list, and we passed our cylinder. quit. */
--

Sorry for any problems that my screwup may have caused.
--
UUCP: {hplabs!hp-sdd ucsd nosc}!crash!fpm
ARPA: crash!fpm@nosc.mil
INET: fpm@crash.cts.com