*BSD News Article 37411


Return to BSD News archive

Xref: sserve comp.os.386bsd.misc:3929 comp.os.linux.misc:28653
Path: sserve!newshost.anu.edu.au!harbinger.cc.monash.edu.au!news.cs.su.oz.au!metro!wabbit.cc.uow.edu.au!picasso.cssc-syd.tansu.com.au!newshost!chrisb
From: chrisb@stork.cssc-syd.tansu.com.au (Chris Bitmead)
Newsgroups: comp.os.386bsd.misc,comp.os.linux.misc
Subject: Re: Meta-data (Was:LINUX SUCKS!!!!)
Date: 3 Nov 94 15:04:35
Organization: Telecom Australia - CSSC
Lines: 85
Distribution: world
Message-ID: <CHRISB.94Nov3150435@stork.cssc-syd.tansu.com.au>
References: <395u3t$ape@epiwrl.entropic.com> <397n15$3fe@goliat.eik.bme.hu>
NNTP-Posting-Host: stork.cssc-syd.tansu.com.au
In-reply-to: pink@fsz.bme.hu's message of 2 Nov 1994 09:44:05 GMT

In article <397n15$3fe@goliat.eik.bme.hu> pink@fsz.bme.hu (Szabolcs Szigeti (PinkPanther)) writes:

>In article ape@epiwrl.entropic.com, kenh@entropic.com (Ken Hornstein) writes:
>>In article <CHRISB.94Nov1123540@stork.cssc-syd.tansu.com.au>,
>>Chris Bitmead <chrisb@stork.cssc-syd.tansu.com.au> wrote:
>>>In article <38up48INN1o5e@rs1.rrz.Uni-Koeln.DE> se@FileServ1.MI.Uni-Koeln.DE (Stefan Esser) writes:
>>>
>>>>The filesystem is one of the parts, where 
>>>>BSD is far more advanced than Linux, in 
>>>>both speed and robustness (nobody in their
>>>>right mind would use the option to switch 
>>>>off synchronous metadata updates under BSD,
>>>>since this might void your filesystem in 
>>>>case of a crash, as is the default under
>>>>Linux).
>>>
>>>Nobody in their right mind would want it turned on since it could cause
>>>crap meta-data if the system crashes. Better to do it the other way round.
>>>Write your data first and then update your meta-data.
>>
>>There's one thing about this approach that I don't understand: if you write
>>your data blocks first and your system dies before the meta-data gets written,
>>how do you know where the data blocks are?  

You don't know. And you don't care. That's the whole point, If the data
didn't get out to disk *completely* intact, including indirect blocks etc.
then it is suspect and should be thrown away. Like a database, if a
transaction didn't succeed completely then you should throw away the lot.

>>If you write the meta-data first,
>>your filesystem recovery program can at least figure out if your meta-data
>>is bogus or not.

But it can't figure it out, that's the point. If an indirect block
pointer, points at a data block, how do you tell if it is the real data or
crap that just happens to be sitting there. This could even cause security
problems. 

>>Polite replies welcome.
>>
>
>Right. If you write data first, then metadata, and your system crashes in between,
>then you might end up in a situation, when the filesystem seems normal, but 
>somewhere there is wrong data written. In other words, that write, when the os
>crashed, looks to be completed successfully, when in fact it isn't. 

No it can't. Think about it. If you always wrote data, then double indirect
blocks, then indirect blocks then inodes then nothing can *ever* point to
the wrong thing. The worst that can happen is that if you wrote something
to the filesystem, and the system crashes you could lose it if it hadn't
been flushed to disk.

If the system has written some data blocks to disk which havn't got any
inodes or indirect blocks pointing to them yet, well fsck will just tag
them as free blocks. At least you don't get crap in files that you can get
with synchronus meta-data writes.

>If you write metadata first, this is much less likely to happen.

Nope.

>Since i haven't used linux extesively, i can't comment on it, but i did
>experiment

Linux does not implement the above scheme of writing data before
meta-data, but since it does everything asyncronusly both meta and
non-meta are more likely to get to disk at the same time, thus lessening
the chances of corruption.

>with NetBSD a lot, including testing fs reliability (for example by pressing
>reset in middle of heavy disk activity) and never ever lost files. The worst thing
>that ever happened is that one file ended up in lost+found. 

People have done similar tests with Linux also with excellent results. As
always with this sort of thing YMMV.

>By "never losing files" i mean that i never lost any file that was already on 
>the disk, and never got corrupted files.
>So for example during a compile i either got a good object or no object
>at all.

>This means that if the object is there, then it is not corrupted.
>
>BTW.: anyone experimented with NetBSD's (4.4BSD's) lfs, concerning
>reliability?