Return to BSD News archive
Xref: sserve comp.bugs.4bsd:1909 comp.unix.bsd:5500 comp.sys.sun.admin:5366
Path: sserve!manuel!munnari.oz.au!yoyo.aarnet.edu.au!news.adelaide.edu.au!cs.adelaide.edu.au!cagney
From: cagney@cs.adelaide.edu.au (Andrew Cagney - aka Noid)
Newsgroups: comp.bugs.4bsd,comp.unix.bsd,comp.sys.sun.admin
Subject: Bug fix for amd, stop it `hanging machines'
Message-ID: <19p2vdINNb4i@huon.itd.adelaide.edu.au>
Date: 23 Sep 92 06:31:09 GMT
Reply-To: cagney@cs.adelaide.edu.au (Andrew Cagney - aka Noid)
Followup-To: comp.unix.bsd
Organization: Comp Sci, Uni of Adelaide, Australia
Lines: 75
NNTP-Posting-Host: winnie.cs.adelaide.edu.au
[ Followups to: comp.unix.bsd ]
Below is a patch for amd5.3beta that fixes one bug that results in amd
`hanging' a unix system. The bug was present in earlier versions of amd.
A machine which has this problem will have an error-hook in the amq output
and many many processes in the wait state.
The problem occures when (for a given mount map):
1. upon evaluating a poorly constructed mount map entry
(eg cd /bug/bad) amd finds no members match so it creates
an error-hook eg:
bad type:=link;fs:=/usr/local;host==edam
(the host here is not edam, so nothing matches -> error-hook)
2. another entry in the same map (eg cd /bug/good) is evaluated and
it may or may not result in a mount, however one of its members
also doesn't match.
good type:=link;fs:=/usr/local;host==achilles \
|| type:=link;fs:=/no.such.app
(the first entry misses as the host isn't achilles either, the
second entry is ok)
3. a reference to the first (eg cd /bug/bad) entry is made (all
within 19 seconds which is the timeout for the error-hook entry).
Things go wrong at point 2. Amd maintains an internal list of all the `mounts'
including error-hook mounts. During 2, amd attempts to re-use the error-hook
created in 1. In doing this the routine find_mntfs() incorrectly changes the
error-hook status to (effectively) `being mounted in the background'. This is
something that will never finish :-).
>From this point on, any lookup on the above map that finds the error hook
will be marked as `being mounted in the background' and will hence never
return.
The patch below, stops amd modifying the error-hook entry (I don't see any
reason for doing this) when find_mntfs() finds an error-hook mount.
I should note that this patch only fixes the above case. Similar problems
occure (on rare occasions) when a remote mount is being slow. Does any one
have a more general fix?
Andrew Cagney
Computer Science
Adelaide University
*** mntfs.c.orig Mon Aug 10 17:58:09 1992
--- mntfs.c Mon Aug 10 18:52:45 1992
***************
*** 171,184 ****
--- 171,186 ----
if (ops == &efs_ops) {
/*
* If the existing ops are not efs_ops
* then continue...
*/
if (mf->mf_ops != &efs_ops)
continue;
+ else
+ return dup_mntfs(mf);
} else /* ops != &efs_ops */ {
/*
* If the existing ops are efs_ops
* then continue...
*/
if (mf->mf_ops == &efs_ops)
continue;
.......................