*BSD News Article 70971

Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!nntp.coast.net!howland.reston.ans.net!swrinde!newsfeed.internetmci.com!in2.uu.net!news.ner.bbnplanet.net!das-news2.harvard.edu!margo
From: margo@eecs.harvard.edu (Margo Seltzer)
Newsgroups: comp.unix.bsd.misc
Subject: Re: Q: fast file system
Date: 14 Jun 1996 15:03:30 GMT
Organization: Harvard University EECS Machines
Lines: 56
Message-ID: <4prv02$j36@necco.harvard.edu>
References: <31BDB7F7.41C67EA6@uiuc.edu> <31BE51CC.41C67EA6@uiuc.edu> <31C0B472.37FEDA0E@lambert.org> <31C10FB0.41C67EA6@uiuc.edu>
NNTP-Posting-Host: cognac.harvard.edu

>No. Not by canonical definition of "internal fragmentation". Clusters
>improve throughput + reduce the size of certain data structures.

Clustering does not reduce the size of any internal data structures
(as currently implemented in 4.4BSD and its derivatives).  The clustering
is used only to decide the unit of IO between the disk and the buffer
cache.  I believe you are thinking of extent-based systems where a large
number of contiguous blocks are referenced by a single pointer.  That
optimization does not appear in any of the 4.4BSD derivatives of which
I am aware.

>>  * allocated as only as many fragments of a large block as are
>>  * necessary.  The file system format retains only a single pointer
>>  * to such a fragment, which is a piece of a single large block that
>>  * has been divided.  The size of such a fragment is determinable from
>>  * information in the inode, using the ``blksize(fs, ip, lbn)'' macro.
>>  *
>>  * The file system records space availability at the fragment level;
>
>I need to know details of how this is done-------------^^^^^^^^^^^^^
>That's the crux of my problem....I want to start mentally at an inode
>and eventually get to all the blocks + this last fragment !


The disk is addressed in terms of fragments.  For example, consider
a simple disk (ignoring cylinder groups and other overhead on the disk)
of 1 GB.  On an 8K/1K file system (one with 8 KB blocks and 1 KB fragments)
there are 2^20 fragments (sometimes, confusingly referred to as "disk
blocks"), addressed as "blocks" 0 - 2^20-1..

The allocation bitmaps in FFS are stored in terms of these "disk blocks"

Pointers in inodes contain the address of the first fragment of the
unit referenced by the pointer.

In order to determine if the unit referenced is a block, fragment, or
set of fragments, you examine the file size.  If the file is larger than
(NDADDR * blocksize), then all allocated units are file system blocks
(8K in this example).

If the file is smaller than (NDADDR * blocksize), you take the file
size modulo the block size.  This number tells you how many fragments
are referenced by this block pointer.

Consider an 11 KB file.  It would have two block pointers, the first
would point to a whole 8KB block; the second would point to 3, 1KB
fragments, contiguously allocated.

By examining the bitmaps, you cannot tell which fragments are allocated
to which files; you can only figure out this information by traversing
from the inode pointers (therefore, the question of, "how do you know
that 1 frag of this block belongs to file X while the other 7 belong
to file Y," is moot).

- Margo Seltzer
margo@eecs.harvard.edu