Return to BSD News archive
Newsgroups: comp.unix.bsd
Path: sserve!manuel.anu.edu.au!munnari.oz.au!spool.mu.edu!sdd.hp.com!cs.utexas.edu!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry
From: terry@cs.weber.edu (A Wizard of Earth C)
Subject: Re: INTERNATIONALIZATION: JAPAN, FAR EAST
Message-ID: <1992Dec16.221634.4879@fcom.cc.utah.edu>
Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages
Sender: news@fcom.cc.utah.edu
Organization: Weber State University (Ogden, UT)
References: <1992Dec14.185028.9757@fcom.cc.utah.edu> <1gksolINNmkg@frigate.doc.ic.ac.uk> <mathias.724467456@sune.stacken.kth.se>
Date: Wed, 16 Dec 92 22:16:34 GMT
Lines: 128
In article <mathias.724467456@sune.stacken.kth.se> mathias@stacken.kth.se (Mathias Bage) writes:
>In <1gksolINNmkg@frigate.doc.ic.ac.uk> kd@doc.ic.ac.uk (Kostis Dryllerakis) writes:
[ ... Re: INTERNATIONIZATION ... ]
>> Preliminary attemps have already been made (I personally work
>>under X-windows with greek ISO-standard characters without many
>>problems) but a coordinated effort for internationalisation is indeed
>>necessary. Note that the rest of the operating systems are currently
>>"externally touched" in order to support the greek language i.e. bu
>>hacking your way out.
>
> Has anyone in this newsgroup ever heard of the Unicode/ISO10646
>(UCS) standard? It exists today and has everything (almost), even
>though the Japanese don't like the sort order of the Kanji
>characters... Look/ask in comp.internat.std for more info. See also
>RFC 1345.
I mentioned Unicode as the proposed 386BSD target standard, with ISO
character set attribution on specific files *within* the file system
as a means of avoiding eating huge chunks of storage in languages
with existing 8-bit representations (ie: the to/from translation would
be done in a file system layer (perhaps the VFS syscall layer) common
to all file systems).
I would be more likely to endorse Unicode than the 10646 draft standard
(which includes Unicode) simply because ISO-10646 *is* draft.
Unicode (from 5 of the 7 responses garnered so far) is pretty much
uniformly hated in Japan; the Japanese seem to prefer the JIS encoding
(ala kterm and jterm). While this *is* embodied in an existing
standard (XPG4), it has the drawback of preventing a unified character
glyph space, such as that provided by Unicode.
I suspect this preference stems from the existing equipment, state
tables, and IBM VGA support for JIS more than any real prejudice
against the standard for technical reasons.
The unvarnished facts are:
1) Microsoft NT is Unicode based.
2) Unicode provides a ROMable X font (we'd have to build one;
it's actually the fact of the non-overlapping glyph space
that provides an advantage over JIS).
3) Unicode provides a means of simultaneous storage of multilingual
documents on the same system.
4) Use of Unicode within the file system's directory service name
space provides a means of internationalizing 386BSD itself.
5) A "Unicode outline font" project is currently under way in
China.
6) Unicode allows for "localization ready" as opposed to simply
"internationalizable" UNIX tools and utilities.
7) Fixed field lengths are observed in utilities/programs regardless
of the localized language (ie: 80 English characters=80 Greek
characters=80 Cyrillic characters=80 Kanji characters). A runic
implementation would cause field lengths to vary, peraps radically.
8) Support for nearly all written human languages, with a proposed
expansion for a larger set.
The drawbacks are:
1) Non-compliance with XPG4.
2) Probable non-compliance with ISO-10646 (due to it being incomplete).
3) Japaneese engineers don't like it (probable reason: current JIS
investment in man hours/money).
4) "Connection rules" For languages (like Tamil and Arabic) do not
translate readily into X display technology.
5) A rewrite is necessary for most of the JIS input tables and
semantics to give an identical key sequence/Kanji presentation
for Japanese.
The arguments are:
1) Non-compliance with XPG4 is not a problem, since it is impossible
to comply with both XPG4 and ISO-10646.
2) By utilizing the ISO-10646 draft, conflicts with the completed
standard can be minimized.
3) This is sticky. If the reason Japanese engineers dislike Unicode
is simply embedded technology (JIS/XPG4-JIS), then we don't have
a problem... the technology used should not be apparent to the
user in any case. If the JIS technology is preferred over the
Unicode technology because of engineering simplification for
romanji/kana conversion to kanji, then the problem is a little
more difficult, but is surmountable with ~16K of conversion
vector tables (small overhead compared to the memory taken by a
single font). If the JIS ordering is preferred because it aids
in stroke-count analysis for symbol recognition, *then* we have
a problem.
4) Connection rules for, for instance, Tamil, can not be resolved
adequately using any of the existing character technologies for
X; thus it is not at issue.
5) A rewrite will be necessary for these tables regardless, even were
we to choose XPG4-JIS encoding, if only because the encoding is
going to vary when the character tables are offset to form a
Unicode-like non-intesecting glyph set (necessary for "localization
ready" as opposed to "internationalizable" OS and tools).
Definitions:
localization ready: Missing per-locale translation of text
strings. All work has been done to
display drivers & environment to support
drop in message databases in the local
language.
internationalizable: Missing per-locale translation of text
strings. Missing OS/FS support for
local language representation. May
run "localized" apps like jterm/kterm.
A significant advantage of a "localization ready" OS is the ability to
supply a "default" environment through a static which is modified by
examination of the "LOCALE" or other language specification mechanism
in the user's environment. Thus all applications written on the
system are already "enabled" by virtue of their use of the C library;
this assumes use of "unichar" types, etc., within the applications.
Terry Lambert
terry@icarus.weber.edu
terry_lambert@novell.com
---
Any opinions in this posting are my own and not those of my present
or previous employers.
--
-------------------------------------------------------------------------------
"I have an 8 user poetic license" - me
Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial
-------------------------------------------------------------------------------