*BSD News Article 95241

Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!news.mel.connect.com.au!news.syd.connect.com.au!news.bri.connect.com.au!corolla.OntheNet.com.au!not-for-mail
From: Tony Griffiths <tonyg@OntheNet.com.au>
Newsgroups: comp.unix.bsd.freebsd.misc
Subject: Re: mSQL SLOOOOW???
Date: Wed, 14 May 1997 12:38:44 +1000
Organization: On the Net (ISP on the Gold Coast, Australia)
Lines: 40
Message-ID: <337925B4.7490@OntheNet.com.au>
References: <5krqn6$cm@ocean.silcom.com> <5l66d0$l1m@ocean.silcom.com>
Reply-To: tonyg@OntheNet.com.au
NNTP-Posting-Host: swanee.nt.com.au
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 3.0 (WinNT; I)
To: David Carmean <dlc@silcom.com>
Xref: euryale.cc.adfa.oz.au comp.unix.bsd.freebsd.misc:40809

David Carmean wrote:
> 
> Thanks for the mail replies.
> 
> The machine actually had 128MB; I just didn't yet know enough about
> FreeBSD to configure the kernel to see it.  Now after the first
> query, the file is cached in RAM and the queries take about
> four *seconds* instead of four minutes.

Glad to see that speed has improved _somewhat_  ;-))

> 
> I must still have an I/O problem, I think.

What makes you think this?

> 
> And should a simple 124000-line, 13MB text table *really* grow to
> 85MB when I import it into MSQL?

Ah, the joys of using RDMS-type databases...  You said that you are
using mSQL 2.0 beta ? which is supposed to handle variable-length
strings better so the blowout does seem a little excessive.

A database system that uses _REALLY_ good storage technique is Mumps
which is both a language and a db.  It "compresses" the key data so that
there is no replication of the common sub-string from the previous key. 
Eg.

^FRED("key 1","key 2",42,"key 3")="Something-or-other"
^FRED("key 1","key 2",42,"foobar")="Something-else"

would result in the common sub-string {"key 1","key 2",42} disappearing
in the second node stored since it can be retrieved from the previous
node.  All that is stored is the Common_Character_Count with the
previous node plus the Uncommon_Character_Count with the remainder of
the key string.  Of course the first node in an index block contains the
full key string but subsequent nodes only contain partial strings!

Tony