*BSD News Article 61847

Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!bunyip.cc.uq.oz.au!munnari.OZ.AU!news.ecn.uoknor.edu!news.ysu.edu!usenet.ins.cwru.edu!agate!howland.reston.ans.net!newsfeed.internetmci.com!inet-nntp-gw-1.us.oracle.com!news.caldera.com!news.cc.utah.edu!park.uvsc.edu!usenet
From: Terry Lambert <terry@lambert.org>
Newsgroups: comp.unix.bsd.freebsd.misc,comp.os.linux.development.system
Subject: Re: The better (more suitable)Unix?? FreeBSD or Linux
Date: 12 Feb 1996 00:41:05 GMT
Organization: Utah Valley State College, Orem, Utah
Lines: 117
Message-ID: <4fm2b1$ivs@park.uvsc.edu>
References: <4er9hp$5ng@orb.direct.ca> <strenDM7Gr4.Cn2@netcom.com> <DMD8rr.oIB@isil.lloke.dna.fi> <4f9skh$2og@dyson.iquest.net> <4fg8fe$j9i@pell.pell.chi.il.us>
NNTP-Posting-Host: hecate.artisoft.com
Xref: euryale.cc.adfa.oz.au comp.unix.bsd.freebsd.misc:14119 comp.os.linux.development.system:17740

orc@pell.chi.il.us (Orc) wrote:
]    Do you have any concrete evidence to back up this assertion?

How about proof by induction?

If I have N metadata writes outstanding, then in case of a crash,
I must resolve the inconsistency cause by N(N-1) potentially
"correct" states for the outstanding metadata (we can assume one
write per state if we assume idempotent operations occur atomically).

Ext2fs allows N pending metadata writes.

UFS allows 1 pending metadata write.

For Ext2fs, the number remains N(N-1).

For UFS, the number is 1(1-1), or 0 (which is to say, the recovery
process is deterministic).


]    No, this isn't a Linux vs FreeBSD debate, though it's certainly
] one of the things that makes FreeBSD less attractive for my news
] machine; I keep people one one side stating that writing metadata
] out of order is safer than treating metadata like anything else,

You mean "in order".  Synchronous metadata writes ensure that the
writes are "in order" with respect to other metadata, which is
to say the the FS structure may be deterministically recovered
in case of a failure to write metadata.

Synchronous writes are actually unnecessary, as long as a delayed
ordered write mechanism is employed to ensure idempotence.  They
are just the easiest way to implement ordering guarantees.

It is the ordering guarantees that are important, not the
synchronicity or non-synchronicity of the underlying mechanism
for making those guarantees.

UFS in Solaris and in SVR4 ES/MP (UnixWare 2.x) uses a delayed
ordered write mechanism as part of the file system multithreading
for support of Symmetric MultiProcessing.  Use of synchronicity
to provide ordering guarantees precludes reeentrancy for metadata
operations.

Other facilities, such as journalling and logging, *also* provide
ordering guarantees.

The best paper I have seen on this so far is Gregory R. Ganger
and Yale N. Pratt's paper "Metadata Update Performance in File
Systems", where they propose a mechanism they term "soft updates".

A related paper, Eric H. Herrin II and Raphael A. Finkel's "The
Viva File System" goes into some detail on what constitutes an
idempotent vs. a non-idempotent operation, and where you must
guarantee order atomicity -- as does the UCB "SPRITE" paper.


] and I've seen people on the other side mentioning that writing the
] metadata, then, at some distant time in the future, coming back and
] putting the data down opens a wonderful window of opportunity for
] squeaky-clean-looking but completely garbaged files.

Yes and no.  Yes in the case of a recovery, since with N (N>1),
there are O((N-1)^2) potential "consistent" states to which the
file system may be restored by the post-event recovery process.

No in that case that async I/O on non-metadata data will
potentially cause it to be corrupt anyway -- just not in such a
way as to cause the file system to be inconsistent, and therefore
unrunnable.

UFS is concerned that the recovered state match the intended
state prior to the crash down to the granularity of a single
operation.  It is also concerned with strict implementation
of POSIX update semantic guarantees.

Ext2fs is concerned that the recovered state match the intended
state prior to the crash down to the granularity of the number
of operations potentially outstanding at the far end if the
sync frequency window.

]  And my experience running news on filesystems without
] synchronous metadata writes certainly hasn't shown any
] vulnerability, even when I've been running beta software like
] a software disk array that showed the distressing tendency to
] lock up and die when being driven hard.  (Okay, so it's possible
] that every time it died it caused filesystem problems only on
] the articles which I didn't read, but it certainly never
] corrupted the directory structure; that's only happened when
] I've foolishly dropped too many power eaters into the machine
] and had the disks starve in the middle of a metadata write.)

Most likely you haven't hit the window.  The disk syncing window
on ext2fs is smaller that the UFS window (ie: it is synced more
frequently in an attempt to foreshorten the window).  This reduces
the probability in direct proportion to the MTBF of your power
supply or other event that may cause a spontaneous reboot (or
require a user-directed reset without a normal shutdown).

This does not mean that the window is not there.


As far as successful recovery following a soft failure: all file
system recovery tools will, when run, result in a consistent file
system structure.  The question is what is the probability of
arriving at the "correct" consistent state given a large number
of "potential" consistent states resulting from the permutations
of predicted outcome for all potential outstanding metadata
operations at the time of the crash.


					Regards,
                                        Terry Lambert
                                        terry@cs.weber.edu
---
Any opinions in this posting are my own and not those of my present
or previous employers.