*BSD News Article 9200


Return to BSD News archive

Received: by minnie.vk1xwt.ampr.org with NNTP
	id AA5364 ; Wed, 23 Dec 92 15:17:06 EST
Xref: sserve comp.unix.bsd:9257 comp.os.linux:20107
Newsgroups: comp.unix.bsd,comp.os.linux
Path: sserve!manuel.anu.edu.au!munnari.oz.au!sgiblab!darwin.sura.net!paladin.american.edu!news.univie.ac.at!hp4at!mcsun!sun4nl!freya.let.rug.nl!thor!s0356514
From: s0356514@let.rug.nl (H.H. Bergman)
Subject: Re: UNICODE (was Dumb Americans (was INTERNATIONALIZATION:...)
Message-ID: <1992Dec21.194942.16107@let.rug.nl>
Sender: news@let.rug.nl (news manager)
Nntp-Posting-Host: thor.let.rug.nl
Organization: Faculteit der Letteren, Rijksuniversiteit Groningen, NL
X-Newsreader: TIN [version 1.1 PL6]
References: <1992Dec18.165905.8414@unislc.uucp>
Date: Mon, 21 Dec 1992 19:49:42 GMT
Lines: 47

Ed Carp (erc@unislc.uucp) wrote:
: Richard L. Goerwitz (goer@kimbark.uchicago.edu) wrote:
: 
: : One of the big criticism leveled at US Engineers is that they are either
: : too dumb or lazy to build into their software support for non-Western
: : scripts.
: 
: I'd be more than happy to build internationalization into my code, if I knew how
: to do it... <sigh>
Plan 9 supports Unicode/UTF. They have postscript manuals available somewhere
too. Unicode uses 16 bit characters with the usual 7-bit ASCII set included
in the beginning. UTF is a special encoding that is compatible with normal
7-bit ASCII files, as long as you use only the regular ASCII chars. For
higher characters it uses a multibyte sequence.
The problem with Unicode is of course, that to support it fully, you'll need
to adapt all existing programs. [And displaying non-ascii characters on a 
simple terminal is out of the question.]

There are two books published by the Unicode Consortium that describe
the unicode standard. I've only seen the second part, but from reading
that, the first part seems to be pretty essential. They cost about $40
here each, I think, so I haven't bought them yet.

It would be possible to map a subset of the Unicode characters to ASCII or
Latin-1 in order to display tex on simple non-graphic terminals. 
Xwindows seems to support 16-bit characters too, but I have no
info about that at all.

*Lots* [I dare say, nearly every C program in existence] of programs
assume sizeof(char)=1 byte.  With Unicode this is no longer true, causing
lots of problems for characters > 127.

If enough people want this, I may consider writing a support library for
Unicode/UTF to provide basic manipulations. [But I'll first have to do
some more hacking on my QIC-02 driver...] Somebody want to donate Unicode
fonts to GNU? [They also need other free fonts btw.]

Once the GNU text/file/bin utils support Unicode/UFT, others will probably
follow, but it would still require *lots* of effort.

Linus, how would you feel about having Unicode support in the kernel?
Having Unicode filenames would be really cool. ;-)

: -- 
: Ed Carp			erc@apple.com, erc@saturn.upl.com	801/538-0177

--Hennus Bergman