*BSD News Article 82685


Return to BSD News archive

Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!munnari.OZ.AU!news.Hawaii.Edu!news.uoregon.edu!hammer.uoregon.edu!news-peer.gsl.net!news.gsl.net!news.mathworks.com!uunet!in3.uu.net!nwnews.wa.com!news1.halcyon.com!usenet
From: "Duane H. Hesser" <dhh@androcles.com>
Newsgroups: comp.unix.bsd.freebsd.misc
Subject: Re: Problems with HTDIG on 2.1.5R
Date: Mon, 11 Nov 1996 22:06:09 -0800
Organization: Northwest Nexus Inc.
Lines: 78
Message-ID: <328813D1.41C67EA6@androcles.com>
References: <56007m$s7f@service3.uky.edu>
NNTP-Posting-Host: androcles.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 3.0Gold (X11; I; FreeBSD 2.1.5-STABLE i386)

John Soward wrote:
> 
> I'm experiencing some problems with htdig (3.0.4 and 3.0.5) under FreeBSD
> 2.1.5R. Everything compiles and runs fine -- but when I attempt to index a
> large server, I get a memory allocation error (in 3.0.5 I get 'out of memory
> in 'new''). I have 128M in the machine with 256M of swap. I'm running as
> 'root'...top show's I'm only using about 8M of memory.
> 
> I've compiled the same code with gcc/g++ 2.7.2 on an HPUX10 machine and it
> completes the index fine...
> 
> I've tried installing gcc/g++ 2.7.2.1 and libg++2.7.2 on the FreeBSD machine
> -- but get the same error...is this a gnumalloc problem?
> 
> Anyone else have this problem?
> 
> thanx,
> --
> John Soward             <a href="http://neworder.cc.uky.edu/">JpS</a>
> Systems Programmer      'The Midnight sun will burn you up.'
> University of Kentucky    (NeXT and MIME mail OK)         -R. Smith

I have experienced similar problems with Htdig under Ultrix 4.3 and HPUX
9.03.
There are two possible sources of the problem (if your problem is
similar to mine).

The first problem I would consider most likely, except that you say that
"top" does not
report large memory usage.  This problem occurs in 'htmerge'.  Look in
the file 'htmerge/doc.cc',  After reading all urls into a linked list, a
'while' loop
reads each document into a structure 'ref', processes it, then reads the
next document
into 'ref'.  As this loop proceeeds, the entire web structure is read
into memory and
LEFT THERE.  Try adding 'delete ref' at the bottom of the loop.  If this
isn't your
problem, someday it will be.

The other problem is 'sort'.  Actually, there are two possible
problems.  The first
is temp directory space.  Htdig may easily require > 100 megabytes of
disk space for
a sort; if you are sorting in /usr/tmp, make sure it's big enough.  Or
set TMPDIR
to someplace big enough.  It should be easy enough to determine if
you're running out
of sort filespace.  Sorry I can't be more specific--all of the relevant
files and notes
are at work, and you know what they say--the memory is the first thing
to go :).

There is another possible problem with 'sort', which revealed itself
only under Ultrix
(not HPUX).  The System 5'ish sort uses internal buffers which are
adaptable to large
memory requirements which will overwhelm the older bsd'ish (actualyy
version 7) 'sort'.
Ultix has both styles, in different paths, but 'Htdig' uses a
configurable path to 'sort'
in some places, and hard-coded path in others.  It was necessary to
assure that the
System 5'ish 'sort' was used everywhere.  I'm just not sure whether the
GNU sort used
by freebsd preserves this problem or not.

All of this is pretty vague, I realize, but it should give you some
places to look.
If necessary, send me some mail, and I can try to generate some diffs.

One last thing that I don't remember--the exact version of HTdig that I
have.  I CAN
verify that gcc 2.7.2 was used for both compiles.

-- 
Duane H. Hesser
dhh@androcles.com