*BSD News Article 9545


Return to BSD News archive

Received: by minnie.vk1xwt.ampr.org with NNTP
	id AA5951 ; Sat, 02 Jan 93 02:05:03 EST
Path: sserve!manuel.anu.edu.au!munnari.oz.au!spool.mu.edu!sgiblab!nec-gw!nec-tyo!wnoc-tyo-news!cs.titech!titccy.cc.titech!necom830!mohta
From: mohta@necom830.cc.titech.ac.jp (Masataka Ohta)
Newsgroups: comp.unix.bsd
Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST)
Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages
Message-ID: <2615@titccy.cc.titech.ac.jp>
Date: 4 Jan 93 18:57:31 GMT
References: <id.M2XV.VTA@ferranti.com> <1992Dec18.043033.14254@midway.uchicago.edu> <1992Dec18.212323.26882@netcom.com> <1992Dec19.083137.4400@fcom.cc.utah.edu> <2564@titccy.cc.titech.ac.jp> <1992Dec28.062554.24144@fcom.cc.utah.edu>
Sender: news@titccy.cc.titech.ac.jp
Organization: Tokyo Institute of Technology
Lines: 131

In article <1992Dec28.062554.24144@fcom.cc.utah.edu>
	terry@cs.weber.edu (A Wizard of Earth C) writes:

>|> Do you know what Shift JIS is? It's a defacto standard for charcter encoding
>|> established by microsoft, NEC, ASCII etc. and common in Japanese PC market.
>
>I am aware of JIS; however, even you must agree that the Japaneese hardware
>and software markets have not reached the level of "commodity hardware"
>found elsewhere in the world (ie: the US and Europe).

Sigh... WIth DOS/V you can use Japanese on  YOUR "commodity hardware".

>I think other mechanisms, such as ATOK, Wnn, and KanjiHand deserve to be
>examined.  One method would be to adopt exactly the input mechanism of
>"Ichi-Taro" (the most popular NEC 98 word processer).

They run also on IBM/PC.

>|> In the workstation market in Japan, some supports Shift JIS, some
>|> supports EUC and some supports both. Of course, many US companies
>|> sell Japanized UNIX on thier workstations.
>
>I think this is precisely what we want to avoid -- localization.  The basic
>difference, to my mind, is that localization invloves the maintenance of
>multiple code sets, whereas internationalization requires maintenance of
>multiple data sets, a much smaller job.

>This I don't understand.  The maximum translation table from one 16 bit value
>to another is 16k.

WHAAAAT? It's 128KB, not 16k.

>This means 2 16k tables for translation into/out of
>Unicode for Input/Output devices,

I'm afraid you don't know what Unicode is. What, do you mean, "tables for
translation" is?

>I don't see why the storage mechanism in any way effects the validity of the
>data

*I* don't see why the storage mechanism in any way effects the validity of the
data

>and thus I don't understand *why* you say "with Unicode, we can't
>achieve internationalization."

Because we can't process a data mixed with Japanese and Chinese.

>I don't understand this, either.  This is like saying PC ASCII can not cover
>both the US and the UK because the American and English pound signs are not
>the same, or that it can't cover German or Dutch because of the 7 characters
>difference needed for support of those languages.

Wrong. The US and UK sign are the same character while they might be assigned
different code points in different countryies.

Thus, in universal coded character set, it is corrent to assign a
single code point to the single pound sign, even though the character
is used both in US and UK.

But, corresponding characters in China/Japan, which do not share the
same graphical representation even on the moderate quality printers
thus different characters, are assigned the same code point in Unicode.

>|> Of course, it is possible to LOCALIZE Unicode so that it produces
>|> Japanese characters only or Chinese characters only. But don't we
>|> need internationalization?
>
>The point of an internationalization effort (as *opposed* to a localization
>effort) is the coexistance of languages within the same processing means.
>The point is not to produce something which is capable of "only English" or
>"only French" or "only Japanese" at the flick of an environment variable;
>the point is to produce something which is *data driven* and localized by
>a change of data rather than by a change of code.  To do otherwise would
>require the use of multiple code trees for each language, which was the
>entire impetus for an internationalization effort in the first place.

That is THE problem of Unicode.

I was informed that MicroSoft will provide a LOCALIZATION mechanism
to print correnponding Chinese/Japanese characters of Unicode
differently.

So, HOW can we MIX Chinese and Japanese without LOCALIZATION?

>your argument that the lexical order of the
>target language effects the usability of a storage standard is invalid.

My argument has nothing to do with lexical ordering.

>Sure, the translation mechanisms may be *easier* to code given localization
>of lexical ordering, but that doesn't mean they *can't* be coded otherwise;

Of course, any coding is equally OK for translation.

>This involves yet another
>set of localization-specific storage tables to translate from an ISO or
>other local font to Unicode and back on attributed file storage.

FILE ATTRIBUTE!!!!!????? *IT* *IS* *EVIL*. Do you really know UNIX?

How can you "cat" two files with different file attributes?

What attribute can you attach to semi binary file, in which some field
contains an ASCII string and some other field contains a JIS string?

>To do
>otherwise would require 16 bit sotrage of files, or worse, runic encoding
>of any non-US ASCII characters in a file.  This either doubles the file
>size for all text files (something the west _will_not_accept_),

Do you know what UTF is?

>or
>"pollutes" the files (all files except those stored in US-ASCII have file
>sizes which no longer reflect true character counts on the file).

That's already true for languages like Japanese, whose characters are
NOT ALWAYS (but sometimes) represented with a single byte.

But, what's wrong with that?

>Admittedly, these mechanisms are adapatable for XPG4 (not widely available)
>and XPG3 (does not support eastern languages), but the MicroSoft adoption
>of Unicode tells us that at least 90% of the market is now committed to
>Unicode, if not now, then in the near future.

Do you think MicroSoft will use file attributes?

						Masataka Ohta