*BSD News Article 62510


Return to BSD News archive

Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!newshost.telstra.net!act.news.telstra.net!psgrain!iafrica.com!pipex-sa.net!plug.news.pipex.net!pipex!weld.news.pipex.net!pipex!tank.news.pipex.net!pipex!news.mathworks.com!newsfeed.internetmci.com!usenet.eel.ufl.edu!nntp.neu.edu!camelot.ccs.neu.edu!nntp.ccs.neu.edu!albert
From: albert@krakatoa.ccs.neu.edu (Albert Cahalan)
Newsgroups: comp.unix.bsd.freebsd.misc,comp.os.linux.development.system
Subject: Re: The better (more suitable)Unix?? FreeBSD or Linux
Date: 27 Feb 1996 20:59:49 GMT
Organization: Northeastern University, College of Computer Science
Lines: 40
Message-ID: <ALBERT.96Feb27155949@krakatoa.ccs.neu.edu>
References: <4er9hp$5ng@orb.direct.ca> <311C5EB4.2F1CF0FB@freebsd.org>
	<CBITMEAD.96Feb26173656@versant.versant.com.au>
	<4gt6mb$pv@park.uvsc.edu>
NNTP-Posting-Host: krakatoa.ccs.neu.edu
In-reply-to: Terry Lambert's message of 26 Feb 1996 20:54:35 GMT
Xref: euryale.cc.adfa.oz.au comp.unix.bsd.freebsd.misc:14632 comp.os.linux.development.system:18381

>>>>> "T" == Terry Lambert <terry@lambert.org> writes:

T> cbitmead@versant.versant.com.au wrote: ] >Sync metadata is an
T> implementation of ordered writes.  It's ] >about as trivial an
T> implementation as you can possibly devise, ] >but it *is* one.  ]  ] Except
T> that it is the wrong order.  The correct way is to write the ] data first
T> and then the meta-data. This ensures consistent data.


T> How, in your proposed implementation, would you distinguish allocated
T> blocks that have been written from allocated blocks that have not been
T> written in a two user delete/create case?

T> Which is to say, Bob deletes file "foo", Jim copies secure file "fum",
T> writes some sensitive data to "fum" in a block that belonged to "foo", and
T> the system crashes before "foo" is really deleted.

Nope, blocks are not deallocated until the metadata is written.
The filesystem always has a consistant state - all fdisk needs
to do is check a few CRC protected, timestamped blocks to make
sure they all agree.  This is so trivial that the kernel can
do it at mount time.  This ideal filesystem wastes space though.

How it works:
User changes a file, and it is put into free space on disk.
This changes the directory/inode/whatever, so these data
structures are also copied into free space.  Note that nothing
points to this information, so the changes would not exist if
the system crashed.  At some point, a block is written that
points to the root directory, inode table, and free block list.
Since this is the one point of failure (the block could be half
written at a crash), the block is written several places with
timestamps and CRC codes.  When this block is successfully written,
the filesystem has advanced to a new state and all changes on disk
are committed.  Only at this time can "deallocated" blocks get
put back into the free pool.
--

Albert Cahalan
albert@ccs.neu.edu