Return to BSD News archive
Xref: sserve comp.arch.storage:1752 comp.unix.bsd:12768
Newsgroups: comp.arch.storage,comp.unix.bsd
Path: sserve!newshost.anu.edu.au!munnari.oz.au!news.Hawaii.Edu!ames!olivea!apple.com!amd!netcomsv!netcom.com!gordoni
From: gordoni@netcom.com (Gordon Irlam)
Subject: ufs'93 [File size survey results]
Message-ID: <gordoniCF4E3K.7LI@netcom.com>
Followup-To: comp.arch.storage
Organization: NETCOM On-line Communication Services (408 241-9760 guest)
Date: Tue, 19 Oct 1993 01:14:07 GMT
Lines: 405
A static analysis of unix file systems circa 1993
-------------------------------------------------
Thanks to everyone who responsed to my previous request for help in
gathering data on unix file sizes.
I received file size data for 6.2 million files, residing on 650 file
systems, on over 100 machines, with a total size of 130 gigabytes.
File sizes
----------
There is no such thing as an average file system. Some file systems
have lots of little files. Others have a few big files. However as a
mental model the notion of an average file system is invaluable.
The following table gives a break down of file sizes and the amount of
space they consume.
file size #files %files %files disk space %space %space
(max. bytes) cumm. (Mb) cumm.
0 87995 1.4 1.4 0.0 0.0 0.0
1 2071 0.0 1.5 0.0 0.0 0.0
2 3051 0.0 1.5 0.0 0.0 0.0
4 6194 0.1 1.6 0.0 0.0 0.0
8 12878 0.2 1.8 0.1 0.0 0.0
16 39037 0.6 2.5 0.5 0.0 0.0
32 173553 2.8 5.3 4.4 0.0 0.0
64 193599 3.1 8.4 9.7 0.0 0.0
128 167152 2.7 11.1 15.6 0.0 0.0
256 321016 5.2 16.4 58.5 0.0 0.1
512 695853 11.3 27.7 307.7 0.2 0.3
1024 774911 12.6 40.2 616.6 0.4 0.7
2048 999024 16.2 56.5 1496.6 1.1 1.8
4096 831283 13.5 70.0 2415.3 1.8 3.6
8192 607046 9.9 79.8 3540.7 2.6 6.1
16384 474483 7.7 87.5 5549.4 4.0 10.2
32768 321283 5.2 92.8 7519.0 5.5 15.6
65536 196954 3.2 96.0 9118.5 6.6 22.2
131072 114489 1.9 97.8 10607.5 7.7 29.9
262144 64842 1.1 98.9 11906.2 8.6 38.5
524288 34655 0.6 99.4 12707.5 9.2 47.7
1048576 18493 0.3 99.7 13515.1 9.8 57.5
2097152 9329 0.2 99.9 13429.1 9.7 67.3
4194304 4002 0.1 100.0 11602.7 8.4 75.7
8388608 1323 0.0 100.0 7616.6 5.5 81.2
16777216 558 0.0 100.0 6389.5 4.6 85.8
33554432 274 0.0 100.0 6470.9 4.7 90.5
67108864 126 0.0 100.0 6255.9 4.5 95.1
134217728 27 0.0 100.0 2490.5 1.8 96.9
268435456 9 0.0 100.0 1819.7 1.3 98.2
536870912 7 0.0 100.0 2495.7 1.8 100.0
A number of observations can be made:
- the distribution is heavily skewed towards small files
- but it has a very long tail
- the average file size is 22k
- pick a file at random: it is probably smaller than 2k
- pick a byte at random: it is probably in a file larger than 512k
- 89% of files take up 11% of the disk space
- 11% of files take up 89% of the disk space
Such a heavily skewed distribution of file sizes suggests that if one
was to design a file system from scratch it might make sense to employ
radically different strategies for small and large files.
The seductive power of mathematics allows us treat a 200 byte and a 2M
file in the same way. But do we really want to? Are there any
problems in engineering where the same techniques would be used in
handling physical objects that span 6 orders of magnitude.
A quote from sci.physics that has stuck with me: "When things change
by 2 orders of magnitude you are actually dealing with fundamentally
different problems".
People I trust say they would have expected the tail of the above
distribution to have been even longer. They expect at least some
files in the 1-2G range. They point out that DBMS shops with really
large files might have been less inclined to respond to a survey like
this than some other sites. This would bias the disk space figures,
but it would have no appreciable effect on file counts. The results
gathered would still be valuable because many static disk layout
issues are determined by the distribution of small files and are
largely independent of the potential existence of massive files.
Block sizes
-----------
The last block of a file is only be partially occupied, and so as
block sizes are increased so too will the the amount of wasted disk
space.
The following historical values for the design of the BSD FFS are
given in "the Demon book":
fragment size overhead
(bytes) (%)
512 4.2
1024 9.1
2048 19.7
4096 42.9
Files have clearly got larger since then; I obtained the following
results:
fragment size overhead
(bytes) (%)
128 0.3
256 0.5
512 1.1
1024 2.4
2048 5.3
4096 11.8
8192 26.3
16384 57.6
By default the BSD FFS typically uses a 1k fragment size. Perhaps
this size is no longer optimal and should be increased.
[The FFS block size is constrained to be no more than 8 times the
fragment size. Clustering is a good way to improve throughput for FFS
based file systems but it doesn't do very much to reduce the not
insignificant FFS computational overhead.]
It is interesting to note that even though most files are less than
2k, having a 2k block size wastes very little space, because disk
space consumption is so totally dominated by large files.
Inode ratio
-----------
The BSD FFS statically allocates inodes. By default one inode is
allocated for every 2k of disk space. Since an inode consumes 128
bytes this means that by default 6.25% of disk space is consumed by
inodes.
It is important not to run out of inodes since any remaining disk
space is then effectively wasted. Despite this allocating 1 inode for
every 2k is excessive.
For each file system studied I worked out the minimum sized disk it
could be placed on. Most disks needed to be only marginally larger
than the size of their files, but a few disks, having much smaller
files than average, needed a much larger disk - a small disk had
insufficient inodes.
bytes per overhead
inode (%)
1024 12.5
2048 6.3
3072 4.2
4096 3.5
5120 3.1
6144 3.1
7168 3.2
8192 3.5
9216 4.0
10240 4.7
11264 5.4
12288 6.5
13312 7.8
14336 9.4
15360 11.0
16384 12.9
17408 14.9
18432 17.3
19456 19.9
20480 22.6
Clearly the current default of one inode for every 2k of data is too
small.
Misc.
-----
I am keen to gather additional file size data from anybody who might
not have taken part in the original file size survey. If you can find
time to run the script that follows it would be much appreciated.
It is not possible to condense all the potentially useful information
I gathered down to a few tables. Every file system is unique. The
full data comprising this survey can be obtained by anonymous ftp as
cs.dartmouth.edu:pub/file-sizes/ufs93.tar.gz.
Gordon Irlam
gordoni@netcom.com
+1 (415) 336 5889
------------------------------------------------------------------------
#!/bin/sh
#
# sizes: interactive script to measure file sizes.
#
# Should preferably, though not essentially, be run as root.
#
# Tested on:
# AIX 3.2
# DEC OSF/1 1.3
# HP-UX 8.02
# NetBSD 0.9
# SunOS 4.1.2, 5.1, 5.3.
ME=gordoni@netcom.com
TMP=/tmp/sizes
echo "This program gathers statistics on file sizes that can be used"
echo "to help design and tune file systems."
echo
echo "'df' is used to identify locally mounted file systems. You are"
echo "given the opportunity to exclude some of these file systems."
echo "'find' then generates a list of file sizes and the results are"
echo "summarized. This program may be safely aborted at any time."
echo
echo "Please exclude CD-ROM file systems from the list below."
echo "Don't worry, 'find' will not cross NFS or other mount points."
# Search for disks and record the mount points
echo
echo "Press return to search for disks"
read junk
echo "Searching for locally mounted disks ..."
df | \
sed 's|\(/[A-Za-z0-9_/\-]*\)[^/]*\(/[A-Za-z0-9_/\-]*\).*|\1 \2|
/\/dev\// !d
s|\(.*\) \(.*/dev/.*\)|\2 \1|' > $TMP.df
DISKS=`awk '{print $1}' $TMP.df`
MPTS=`awk '{print $2}' $TMP.df`
rm -f $TMP.*
if [ `echo $DISKS | wc -w` -ne `echo $MPTS | wc -w` ]; then
echo "Unable to identify disks!"
exit
fi
# Give the user a chance to skip some of the disks
i=1
for m in $MPTS; do
if [ -d $m ]; then
NUMS="$NUMS $i"
fi
i=`expr $i + 1`
done
echo
while :; do
if [ -z "$NUMS" ]; then
echo "No disks!"
exit
fi
echo " device mount point"
for i in $NUMS; do
d=`echo $DISKS | awk '{print $'$i'}'`
m=`echo $MPTS | awk '{print $'$i'}'`
echo " $i)" $d " " $m
done
echo
echo "Enter number of disk to ignore, or return to start processing"
read nums
if [ -z "$nums" ]; then break; fi
for n in $nums; do
OLD_NUMS=$NUMS
NUMS=
for i in $OLD_NUMS; do
if [ "$n" -ne $i ]; then
NUMS="$NUMS $i"
fi
done
done
done
# Work out find flags to limit search to current disk and to list files
echo > $TMP.f
# 4.3 BSD and friends
find $TMP.f -type f -xdev -print > /dev/null 2>&1 && MFLAG="-xdev"
# SVR3 and friends
find $TMP.f -type f -mount -print > /dev/null 2>&1 && MFLAG="-mount"
# SVR3 and friends - slow
[ `ls -ilds $TMP.f 2> /dev/null | wc -w` -eq 11 ] && \
LFLAG="-exec ls -ilds {} ;"
# 4.0 BSD and friends - slow
[ `ls -gilds $TMP.f 2> /dev/null | wc -w` -eq 11 ] && \
LFLAG="-exec ls -gilds {} ;"
# 4.3 BSD and friends - fast
find $TMP.f -type f -ls > /dev/null 2>&1 && \
LFLAG="-ls"
rm $TMP.f
if [ -z "$MFLAG" -o -z "$LFLAG" ]; then
echo "find does not support -mount or -ls!"
exit
fi
# Search each disk recording file sizes
# ignoring repeat hard links and holey files
for i in $NUMS; do
d=`echo $DISKS | awk '{print $'$i'}'`
m=`echo $MPTS | awk '{print $'$i'}'`
echo "Processing $d $m"
echo "This may take a while. Please wait ..."
echo BEGIN_DATA > $TMP.$i
find $m $MFLAG \( -type f -o -type d \) $LFLAG 2> /dev/null | \
awk '{if ($2 * 1024 >= $7) print $7, $1}' | sort -n | uniq | \
awk '{print $1}' | uniq -c | awk '{print $2, $1}' \
>> $TMP.$i
echo END_DATA >> $TMP.$i
echo
done
echo 'Phew! All done. Results are in "'$TMP'.*"'
# Display a summary of the results
echo
echo "Summarizing results. Please wait ..."
for i in $NUMS; do
cat $TMP.$i
done | sort -n | awk '
BEGIN {
p=1;
for (i=0; i<32; i++) {
pow2[i] = p;
p = 2 * p;
}
s = 0;
}
/^[0-9]/ {
size = $1;
num = $2;
while (size > pow2[s]) {
s++;
}
count[s] += num;
}
END {
for (i=0; i<32; i++) {
if (count[i] > 0) {
limit = i;
}
}
for (i=0; i<=limit; i++) {
files += count[i];
space[i] = (pow2[i] + pow2[i-1]) / 2.0 * count[i];
spaces += space[i];
}
if (spaces == 0) {
print "No results!";
exit 1;
}
print
printf("%12s %12s %7s %12s %7s\n", \
"file size", "#files", "%files", "disk space", "%space");
printf("%12s %12s %7s %12s %7s\n", \
"(max. bytes)", "", "", "(Mb)", "");
for (i=0; i<=limit; i++) {
printf("%12d %12d %7.1f %12.1f %7.1f\n", \
pow2[i], count[i], 100.0 * count[i] / files, \
space[i] / 1.0e6, 100.0 * space[i] / spaces);
}
}' || exit
# Return the results
echo
echo "The results of this survey may be automatically or manually"
echo "returned by mail. Doing so will provide data that will be useful"
echo "in the design and tuning of file systems."
echo
while :; do
echo "Do you wish to automatically return the results (y/n)?"
read RESPONSE junk
case "$RESPONSE" in
[yY]*)
echo
echo "Mailing results to $ME ..."
for i in $NUMS; do
if mail $ME < $TMP.$i; then
rm $TMP.$i
else
REQUEST=y
fi
done
break
;;
[nN]*)
REQUEST=y
break
;;
esac
done
echo
if [ -n "$REQUEST" ]; then
echo 'Please mail files "'$TMP'.*" to '$ME
else
echo "Thank you!"
fi
exit