*BSD News Article 62184


Return to BSD News archive

Newsgroups: comp.unix.bsd.freebsd.misc,comp.os.linux.development.system
Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!news.bhp.com.au!mel.dit.csiro.au!munnari.OZ.AU!spool.mu.edu!howland.reston.ans.net!newsfeed.internetmci.com!EU.net!sun4nl!rnzll3!sys3.pe1chl!rob
From: rob@pe1chl.ampr.org (Rob Janssen)
Subject: Re: The better (more suitable)Unix?? FreeBSD or Linux
Reply-To: pe1chl@wab-tis.rabobank.nl
Organization: PE1CHL
Message-ID: <DMv9HD.8Lv@pe1chl.ampr.org>
References: <4er9hp$5ng@orb.direct.ca> <311250C2.2781E494@public.uni-hamburg.de> <strenDM7Gr4.Cn2@netcom.com> <DMD8rr.oIB@isil.lloke.dna.fi> <4f9skh$2og@dyson.iquest.net> <DMI5Mt.768@pe1chl.ampr.org> <4fophn$ahl@park.uvsc.edu> <DMrCtI.3KC@pe1chl.ampr.org> <4ftl64$fjs@park.uvsc.edu>
Date: Fri, 16 Feb 1996 11:34:24 GMT
Lines: 108
Xref: euryale.cc.adfa.oz.au comp.unix.bsd.freebsd.misc:14367 comp.os.linux.development.system:18032

In <4ftl64$fjs@park.uvsc.edu> Terry Lambert <terry@lambert.org> writes:

>rob@pe1chl.ampr.org (Rob Janssen) wrote:
>] No, I am referring to the situation where an application has written
>] data to a file, the system crashes, and then the file contains other
>] (garbage) data after restart.  While fsck reports no errors.

>This is a situation which is possible only through administrative
>error and acceptance of non-default actions in the process of a
>fsck.

>The block which was not written should not have been present in
>the bitmap, and thus the file should have not referenced the block
>unless the administrator overrode the default in a manual fsck.

The fsck was running in "automatically repair minor defects" mode
on rebooting the system.

>] I once spent quite some time tracking down why UUCP was hanging.  The
>] system had crashed at the moment uux had created a lockfile.  A file
>] with 10 bytes of binary garbage existed on the disk after the restart.
>] This is clearly an indication of this problem.  The file would not be
>] 10 bytes if the application hadn't done the correct write (and probably
>] even the close), yet the data was not the expected ASCII PID.
>] What made this one nasty, is that the UUCP programs read the file, do an
>] atoi() on it, and then use kill() to check if this PID is existing to
>] know if the lockfile is valid or stale.  This failed to work because the
>] atoi returned zero, UUCP (Taylor) did a kill(-1,0) which of course
>] succeeded and thus the lockfile was assumed to be valid and never
>] removed.

>The call would have been "kill( 0, 0)" if atoi returned 0, which
>would, of course, have returned 0 (success).

Sorry, it was indeed kill(pid,0) -> kill(0,0).  I checked the man page
for kill and had the above (from memory) wrong.

>Bogus lockfile data is the reason modern implementations use a
>4 byte binary integer instead of an ASCII representation of the
>number.  At the very least, a well-written program would check
>for the special cases and ensured the PID was > 1 to avoid kill()
>side-effects.  At worst, it would have done an isdigit() on the
>least significant digit in the lockfile read buffer.

Huh?  Binary is better than ASCII?  And why is it "modern"?
I think an ASCII representation with proper checks is much more
resistant to errors than a binary representation (which can only be
checked on range).
My suggestion is to check all digits to be space, digit or \n.

>That aside, a system crash could only result in truncated file
>contents if the proper administrative recovery options were
>taken: for a 10 byte file, it was either an immediate file
>the 10 bytes were in the inode and so were valid) or it was
>a direct block (the block pointer is required to point to a
>buffer in the block allocation bitmap).

>For a newly allocated block, the block allocation bitmap is
>equired to show the block as unallocated *in the on disk copy*,
>which was written using a synchronus write and so is known to
>be on disk, before the allocation is allowable.

>Therefore the administrator must have set the "allocated" bit
>during the fsck (overrriding the default action), rather than
>causing the inode to have the block revoked.

I am sure that this did not happen, but I don't know how it went
wrong.  This was a commercial system and we did not have source.  It
may well be that the manufacturer had "improved" things, but he certainly
did not improve the performance.
(extracting a big tar file, expiring the news, or merely unpacking a
news batch was dreadfully slow on this system.  clearly a sync metadata
effect)

>You were bitten by administrator error, in combination with
>several application errors, not a "sync vs. async" error.


>PS:	What happened to the commands in you /etc/rc to remove
>	lock files following a reboot?

>PPS:	Why was the application violating the UUCP locking
>	protocol by not ignoring the lockfile if it was older
>	than 90 minutes anyway?

I am not discussing the lockfile problem here, but the mere fact that
garbage went into it.
Sure there were many bugs that all together made this a problem, but
that is not at all related to the discussion topic.

FYI,  Taylor UUCP does not check the validity of the file contents, and
there is no "90 minutes" check in the code.  The /etc/rc did not remove
the lockfile because the system was installed with a different lockfile
naming convention than would have been correct, causing port lockfiles
to be named LK* and uux lockfiles named LCK*.  This conflicted with the
remove in /etc/rc.
But, this is all irrelevant to the discussion.  The lockfile should have
been nonexistent, empty, or correct.  Then the kill would have detected
the staleness and there would have been no problem.
I was disappointed that this system, while being so slow at mass file
creation, still allowed this error.

Rob
-- 
+------------------------------------+--------------------------------------+
| Rob Janssen         rob@knoware.nl | BBS: +31-302870036 (2300-0730 local) |
| AMPRnet:       rob@pe1chl.ampr.org | AX.25 BBS: PE1CHL@PI8WNO.#UTR.NLD.EU |
+------------------------------------+--------------------------------------+