*BSD News Article 22574

Xref: sserve comp.arch.storage:1752 comp.unix.bsd:12768
Newsgroups: comp.arch.storage,comp.unix.bsd
Path: sserve!newshost.anu.edu.au!munnari.oz.au!news.Hawaii.Edu!ames!olivea!apple.com!amd!netcomsv!netcom.com!gordoni
From: gordoni@netcom.com (Gordon Irlam)
Subject: ufs'93 [File size survey results]
Message-ID: <gordoniCF4E3K.7LI@netcom.com>
Followup-To: comp.arch.storage
Organization: NETCOM On-line Communication Services (408 241-9760 guest)
Date: Tue, 19 Oct 1993 01:14:07 GMT
Lines: 405

A static analysis of unix file systems circa 1993
-------------------------------------------------

Thanks to everyone who responsed to my previous request for help in
gathering data on unix file sizes.

I received file size data for 6.2 million files, residing on 650 file
systems, on over 100 machines, with a total size of 130 gigabytes.

File sizes
----------

There is no such thing as an average file system.  Some file systems
have lots of little files.  Others have a few big files.  However as a
mental model the notion of an average file system is invaluable.

The following table gives a break down of file sizes and the amount of
space they consume.

   file size       #files  %files  %files   disk space  %space  %space
(max. bytes)                        cumm.         (Mb)           cumm.
           0        87995     1.4     1.4          0.0     0.0     0.0
           1         2071     0.0     1.5          0.0     0.0     0.0
           2         3051     0.0     1.5          0.0     0.0     0.0
           4         6194     0.1     1.6          0.0     0.0     0.0
           8        12878     0.2     1.8          0.1     0.0     0.0
          16        39037     0.6     2.5          0.5     0.0     0.0
          32       173553     2.8     5.3          4.4     0.0     0.0
          64       193599     3.1     8.4          9.7     0.0     0.0
         128       167152     2.7    11.1         15.6     0.0     0.0
         256       321016     5.2    16.4         58.5     0.0     0.1
         512       695853    11.3    27.7        307.7     0.2     0.3
        1024       774911    12.6    40.2        616.6     0.4     0.7
        2048       999024    16.2    56.5       1496.6     1.1     1.8
        4096       831283    13.5    70.0       2415.3     1.8     3.6
        8192       607046     9.9    79.8       3540.7     2.6     6.1
       16384       474483     7.7    87.5       5549.4     4.0    10.2
       32768       321283     5.2    92.8       7519.0     5.5    15.6
       65536       196954     3.2    96.0       9118.5     6.6    22.2
      131072       114489     1.9    97.8      10607.5     7.7    29.9
      262144        64842     1.1    98.9      11906.2     8.6    38.5
      524288        34655     0.6    99.4      12707.5     9.2    47.7
     1048576        18493     0.3    99.7      13515.1     9.8    57.5
     2097152         9329     0.2    99.9      13429.1     9.7    67.3
     4194304         4002     0.1   100.0      11602.7     8.4    75.7
     8388608         1323     0.0   100.0       7616.6     5.5    81.2
    16777216          558     0.0   100.0       6389.5     4.6    85.8
    33554432          274     0.0   100.0       6470.9     4.7    90.5
    67108864          126     0.0   100.0       6255.9     4.5    95.1
   134217728           27     0.0   100.0       2490.5     1.8    96.9
   268435456            9     0.0   100.0       1819.7     1.3    98.2
   536870912            7     0.0   100.0       2495.7     1.8   100.0

A number of observations can be made:
    - the distribution is heavily skewed towards small files
    - but it has a very long tail
    - the average file size is 22k
    - pick a file at random: it is probably smaller than 2k
    - pick a byte at random: it is probably in a file larger than 512k
    - 89% of files take up 11% of the disk space
    - 11% of files take up 89% of the disk space

Such a heavily skewed distribution of file sizes suggests that if one
was to design a file system from scratch it might make sense to employ
radically different strategies for small and large files.

The seductive power of mathematics allows us treat a 200 byte and a 2M
file in the same way.  But do we really want to?  Are there any
problems in engineering where the same techniques would be used in
handling physical objects that span 6 orders of magnitude.

A quote from sci.physics that has stuck with me: "When things change
by 2 orders of magnitude you are actually dealing with fundamentally
different problems".

People I trust say they would have expected the tail of the above
distribution to have been even longer.  They expect at least some
files in the 1-2G range.  They point out that DBMS shops with really
large files might have been less inclined to respond to a survey like
this than some other sites.  This would bias the disk space figures,
but it would have no appreciable effect on file counts.  The results
gathered would still be valuable because many static disk layout
issues are determined by the distribution of small files and are
largely independent of the potential existence of massive files.

Block sizes
-----------

The last block of a file is only be partially occupied, and so as
block sizes are increased so too will the the amount of wasted disk
space.

The following historical values for the design of the BSD FFS are
given in "the Demon book":

fragment size   overhead
   (bytes)        (%)
      512         4.2
     1024         9.1
     2048        19.7
     4096        42.9

Files have clearly got larger since then; I obtained the following
results:

fragment size   overhead
   (bytes)        (%)
      128         0.3
      256         0.5
      512         1.1
     1024         2.4
     2048         5.3
     4096        11.8
     8192        26.3
    16384        57.6

By default the BSD FFS typically uses a 1k fragment size.  Perhaps
this size is no longer optimal and should be increased.

[The FFS block size is constrained to be no more than 8 times the
fragment size.  Clustering is a good way to improve throughput for FFS
based file systems but it doesn't do very much to reduce the not
insignificant FFS computational overhead.]

It is interesting to note that even though most files are less than
2k, having a 2k block size wastes very little space, because disk
space consumption is so totally dominated by large files.

Inode ratio
-----------

The BSD FFS statically allocates inodes.  By default one inode is
allocated for every 2k of disk space.  Since an inode consumes 128
bytes this means that by default 6.25% of disk space is consumed by
inodes.

It is important not to run out of inodes since any remaining disk
space is then effectively wasted.  Despite this allocating 1 inode for
every 2k is excessive.

For each file system studied I worked out the minimum sized disk it
could be placed on.  Most disks needed to be only marginally larger
than the size of their files, but a few disks, having much smaller
files than average, needed a much larger disk - a small disk had
insufficient inodes.

bytes per   overhead
  inode       (%)
   1024      12.5
   2048       6.3
   3072       4.2
   4096       3.5
   5120       3.1
   6144       3.1
   7168       3.2
   8192       3.5
   9216       4.0
  10240       4.7
  11264       5.4
  12288       6.5
  13312       7.8
  14336       9.4
  15360      11.0
  16384      12.9
  17408      14.9
  18432      17.3
  19456      19.9
  20480      22.6

Clearly the current default of one inode for every 2k of data is too
small.

Misc.
-----

I am keen to gather additional file size data from anybody who might
not have taken part in the original file size survey.  If you can find
time to run the script that follows it would be much appreciated.

It is not possible to condense all the potentially useful information
I gathered down to a few tables.  Every file system is unique.  The
full data comprising this survey can be obtained by anonymous ftp as
cs.dartmouth.edu:pub/file-sizes/ufs93.tar.gz.

                                           Gordon Irlam
                                           gordoni@netcom.com
                                           +1 (415) 336 5889
------------------------------------------------------------------------
#!/bin/sh
#
# sizes: interactive script to measure file sizes.
#
# Should preferably, though not essentially, be run as root.
#
# Tested on:
#     AIX 3.2
#     DEC OSF/1 1.3
#     HP-UX 8.02
#     NetBSD 0.9
#     SunOS 4.1.2, 5.1, 5.3.

ME=gordoni@netcom.com
TMP=/tmp/sizes

echo "This program gathers statistics on file sizes that can be used"
echo "to help design and tune file systems."
echo
echo "'df' is used to identify locally mounted file systems.  You are"
echo "given the opportunity to exclude some of these file systems."
echo "'find' then generates a list of file sizes and the results are"
echo "summarized.  This program may be safely aborted at any time."
echo
echo "Please exclude CD-ROM file systems from the list below."
echo "Don't worry, 'find' will not cross NFS or other mount points."

# Search for disks and record the mount points

echo
echo "Press return to search for disks"
read junk
echo "Searching for locally mounted disks ..."

df | \
sed 's|\(/[A-Za-z0-9_/\-]*\)[^/]*\(/[A-Za-z0-9_/\-]*\).*|\1 \2|
     /\/dev\// !d
     s|\(.*\) \(.*/dev/.*\)|\2 \1|' > $TMP.df
DISKS=`awk '{print $1}' $TMP.df`
MPTS=`awk '{print $2}' $TMP.df`
rm -f $TMP.*
if [ `echo $DISKS | wc -w` -ne `echo $MPTS | wc -w` ]; then
    echo "Unable to identify disks!"
    exit
fi

# Give the user a chance to skip some of the disks

i=1
for m in $MPTS; do
    if [ -d $m ]; then
        NUMS="$NUMS $i"
    fi
    i=`expr $i + 1`
done

echo
while :; do
    if [ -z "$NUMS" ]; then
        echo "No disks!"
        exit
    fi
    echo "       device        mount point"
    for i in $NUMS; do
        d=`echo $DISKS | awk '{print $'$i'}'`
        m=`echo $MPTS | awk '{print $'$i'}'`
        echo "    $i)" $d "  " $m
    done
    echo
    echo "Enter number of disk to ignore, or return to start processing"
    read nums
    if [ -z "$nums" ]; then break; fi
    for n in $nums; do
        OLD_NUMS=$NUMS
        NUMS=
        for i in $OLD_NUMS; do
            if [ "$n" -ne $i ]; then
                NUMS="$NUMS $i"
            fi
        done
    done
done

# Work out find flags to limit search to current disk and to list files

echo > $TMP.f

# 4.3 BSD and friends
find $TMP.f -type f -xdev -print > /dev/null 2>&1 && MFLAG="-xdev"
# SVR3 and friends
find $TMP.f -type f -mount -print > /dev/null 2>&1 && MFLAG="-mount"

# SVR3 and friends - slow
[ `ls -ilds $TMP.f 2> /dev/null | wc -w` -eq 11 ] && \
    LFLAG="-exec ls -ilds {} ;"
# 4.0 BSD and friends - slow
[ `ls -gilds $TMP.f 2> /dev/null | wc -w` -eq 11 ] && \
    LFLAG="-exec ls -gilds {} ;"
# 4.3 BSD and friends - fast
find $TMP.f -type f -ls > /dev/null 2>&1 && \
    LFLAG="-ls"

rm $TMP.f

if [ -z "$MFLAG" -o -z "$LFLAG" ]; then
    echo "find does not support -mount or -ls!"
    exit
fi

# Search each disk recording file sizes
# ignoring repeat hard links and holey files

for i in $NUMS; do
    d=`echo $DISKS | awk '{print $'$i'}'`
    m=`echo $MPTS | awk '{print $'$i'}'`
    echo "Processing $d $m"
    echo "This may take a while.  Please wait ..."
    echo BEGIN_DATA > $TMP.$i
    find $m $MFLAG \( -type f -o -type d \) $LFLAG 2> /dev/null | \
        awk '{if ($2 * 1024 >= $7) print $7, $1}' | sort -n | uniq | \
        awk '{print $1}' | uniq -c | awk '{print $2, $1}' \
        >> $TMP.$i
    echo END_DATA >> $TMP.$i
    echo
done
echo 'Phew!  All done.  Results are in "'$TMP'.*"'

# Display a summary of the results

echo
echo "Summarizing results.  Please wait ..."

for i in $NUMS; do
    cat $TMP.$i
done | sort -n | awk '
BEGIN {
    p=1;
    for (i=0; i<32; i++) {
        pow2[i] = p;
        p = 2 * p;
    }
    s = 0;
}
/^[0-9]/ {
    size = $1;
    num = $2;
    while (size > pow2[s]) {
        s++;
    }
    count[s] += num;
}
END {
    for (i=0; i<32; i++) {
        if (count[i] > 0) {
            limit = i;
        }
    }
    for (i=0; i<=limit; i++) {
        files += count[i];
        space[i] = (pow2[i] + pow2[i-1]) / 2.0 * count[i];
        spaces += space[i];
    }
    if (spaces == 0) {
        print "No results!";
        exit 1;
    }
    print
    printf("%12s %12s %7s %12s %7s\n", \
           "file size", "#files", "%files", "disk space", "%space");
    printf("%12s %12s %7s %12s %7s\n", \
           "(max. bytes)", "", "", "(Mb)", "");
    for (i=0; i<=limit; i++) {
        printf("%12d %12d %7.1f %12.1f %7.1f\n", \
               pow2[i], count[i], 100.0 * count[i] / files, \
               space[i] / 1.0e6, 100.0 * space[i] / spaces);
    }
}' || exit

# Return the results

echo
echo "The results of this survey may be automatically or manually"
echo "returned by mail.  Doing so will provide data that will be useful"
echo "in the design and tuning of file systems."
echo

while :; do
    echo "Do you wish to automatically return the results (y/n)?"
    read RESPONSE junk
    case "$RESPONSE" in
        [yY]*)
            echo
            echo "Mailing results to $ME ..."
            for i in $NUMS; do
                if mail $ME < $TMP.$i; then
                    rm $TMP.$i
                else
                    REQUEST=y
                fi
            done
            break
            ;;
        [nN]*)
            REQUEST=y
            break
            ;;
    esac
done

echo
if [ -n "$REQUEST" ]; then
    echo 'Please mail files "'$TMP'.*" to '$ME
else
    echo "Thank you!"
fi

exit