*BSD News Article 8705


Return to BSD News archive

Newsgroups: comp.unix.bsd
Path: sserve!manuel.anu.edu.au!munnari.oz.au!hp9000.csc.cuhk.hk!saimiri.primate.wisc.edu!sdd.hp.com!spool.mu.edu!agate!netsys!pagesat!spssig.spss.com!news.oc.com!eff!news.byu.edu!ux1!fcom.cc.utah.edu!cs.weber.edu!terry
From: terry@cs.weber.edu (A Wizard of Earth C)
Subject: Re: [386BSD] What about localisation?
Message-ID: <1992Dec8.214215.24804@fcom.cc.utah.edu>
Sender: news@fcom.cc.utah.edu
Organization: Weber State University  (Ogden, UT)
References: <1992Dec7.182103.1799@rdrel.relcom.msk.su>
Date: Tue, 8 Dec 92 21:42:15 GMT
Lines: 95

In article <1992Dec7.182103.1799@rdrel.relcom.msk.su> sir@rdrel.relcom.msk.su (Sergey I.Ryzhkov) writes:
>Gentelmens!
>
>Are anybody attempted to make true POSIX-style localisation of
>386BSD for some language? I find only "C-language frames" for "setlocale"
>ans "strcoll" function in c-lib sources and noothing about localisation
>in docs. I plans to do the complete implementation of this feature
>for Russian language but, it seems to me that somebody in the world
>must do something analogous (for some other language, may be)
>I could not find something about except of description of function call
>in POSIX which is do not complete as my minds (may be I do not have all the
>discription) and I well be glad to get the an example of their
>implementation or to contact with some person who can explain me how
>they must be implemented.

Internationalization has been discussed in some detail; the languages I'm
unterested in are Spanish and German (handled by ISO Latin-1) and Greek
and Japaneese (definitely requiring better localization than that provided
by XPG3.

The last discuussion on it ended with a pretty much unanimus agreement
that Unicode was the way to go, but not on storage mechanism or in program
versus in system overhead.  Thus a discussion of Runes (multibyte encoding)
that pretty much insures non-ASCII characters will take multiple bytes, and
that 16 bit Unicode characters will take 8-24 bits to encode (a poor trade
off for anyone using a majority of non-ASCII characters, and rather centered
on internationalization for Western Europe/North and South America).

Last message was:

>
> From: keld@login.dkuug.dk (Keld J|rn Simonsen)
> Subject: Re: multibyte character representations and Unicode
> Message-ID: <keld.722285494@login.dkuug.dk>
>
> The Plan 9 encoding of ISO 10646 is planned for inclusion
> in POSIX .2b standard, and is thus on a standards track.

In any case, I would prefer any Unicode standard, however badly implemented,
to XPG3, which would fail to deal with anything but Western Europe and
North and South America, in my opinion.

Another thing which requires consideration is a set of standardized
messages translated into all supported languages through whatever
localization mechanism we will use for messages in the shell, programs,
and etc. for perror and family.  This will tend to go a long way towards
usability in an international forum -- and probably constitutes our best
bet for high return on the effort we invest, guaranteeing at least base
functionality in supported languages.

I think that this eventually assumes an X environment and a full Unicode
"fixed" font; this is ~250K for a 5x8, <1M for a 10x20 (non-default).
Does such a font already exist?

The other fundamental assumption is multibyte data stream to the tty, and
appropriate localization by the tty itself.  This is an easy mod for
xterm, but requires spanning sets within a given non-PC-ASCII driver
for (for instance) a downloaded Cyrillic font in a VGA/EGS card.  This
would be, fundamentally, a 16-bit to 8-bit "mapchan".

I suggest we attack it in this order:

1)	Pick a standard for encoding (I vote Unicode).
2)	Pick a standard for storage (I vote character set attributed files
	to avoid stream encoding while maintaining the benefits of 8-bit
	storage for most languages).
3)	Create an X environment capable of supporting all languages by
	default (Again, I vote Unicode).
4)	Build some tools for running two character sets simultaneously
	(requires combination of Anglicized/Localized encoding and
	adat entry mechanisms).
5)	Provide basic error message and prompting translations (requires
	fluently bilingual volunteers).
6)	Perform a code integration (probably at the 0.2 level, although
	this may drag on until 0.3).

Anything else that needs to be handled?

Unlike most OS products (with a possible exception for NT, which is Unicode
aware), we have a chance to do this right before the product is too
mature to let us do things "the right way".  We should take the opportunity
while it still presents itself.


					Terry Lambert
					terry@icarus.weber.edu
					terry_lambert@novell.com
---
Any opinions in this posting are my own and not those of my present
or previous employers.
-- 
-------------------------------------------------------------------------------
                                        "I have an 8 user poetic license" - me
 Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial
-------------------------------------------------------------------------------