*BSD News Article 62948


Return to BSD News archive

Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!bunyip.cc.uq.oz.au!munnari.OZ.AU!news.ecn.uoknor.edu!paladin.american.edu!zombie.ncsc.mil!news.mathworks.com!uunet!in1.uu.net!news.iij.ad.jp!sranha.sra.co.jp!sranhc.sra.co.jp!sran230.sra.co.jp!soda
From: soda@sra.CO.JP (Noriyuki Soda)
Newsgroups: comp.unix.bsd.freebsd.misc
Subject: Re: conflicting types for `wchar_t'
Date: 07 Mar 1996 20:01:47 GMT
Organization: Software Research Associates, Inc., Japan
Lines: 128
Distribution: world
Message-ID: <SODA.96Mar8050147@srapc133.sra.CO.JP>
References: <4eijt4$5sa@dali.cs.uni-magdeburg.de> <SODA.96Feb17221233@srapc133.sra.CO.JP>
	<SODA.96Mar1044003@srapc133.sra.CO.JP> <4h7vh1$q3j@park.uvsc.edu>
NNTP-Posting-Host: srapc133.sra.co.jp
In-reply-to: Terry Lambert's message of 1 Mar 1996 22:59:45 GMT

Your article reached here yesterday, news feed is slooow. :-<

>>>>> On 1 Mar 1996 22:59:45 GMT,
	Terry Lambert <terry@lambert.org> said:

> ] XmString cannot be used to interchange multi-script text,
> ] COMPOUND_TEXT is used for this purpose.
> 
> Actually, it can, by using font sets.
> 
> This makes the XmString an acceptable display encoding (note:
> not storage encoding and not process encoding).

And XmString is not acceptable for inter-client communication
encoding.  COMPOUND_TEXT is used for it.

> The COMPOUND_TEXT abstraction is just "Yet Another Incompatible
> Standard", IMO.

Incompatible? No, COMPOUND_TEXT is *fully* upper compatible with
ISO-8859-1.
And, COMPOUND_TEXT also has a compatibility to EUC-chinese,
EUC-taiwanese, EUC-korean, EUC-japanese, ISO-2022-KR, ISO-2022-JP,
ISO-2022-JP2, because these are all based on ISO-2022 framework.

# Unicode doesn't have such compatibility. :-)

> ] IMHO, COMPOUND_TEXT (and ISO-2022 based encoding) is worth using,
> ] though it is not ideal encoding. :-)
> 
> It's just another display encoding.  It's not terribly useful
> for process or storage encoding.

No, ISO-2022 based encodings are widely used in stroage encoding and
network encoding in Asia.

# Note: ISO-8859-1 has also compatibility with ISO-2022 framework, so
#	that ISO-2022 based encoding is widely used in America/Europe. :-)

> ] For example, ISO-2022 provides straight forward way to use
> ] different font for each script.
> 
> There is only a need to change fonts in a multilingual
> document where the character sets intersect (more on this
> below).  

I think you are misunderstanding my argument.
(Probably because my English is not clear :-<)

ISO-2022 represents each character by combination of code-set and
code-point. So that, we can use code-set information for X11 font
encoding, code-point information to X11 DrawString data.

This is pretty compatible with X11 fontset abstraction, Unicode is not.

> ] >>> 32 bits is a silly value for the size, since XDrawString16 exists
> ] >>> and XDrawString32 does not.
> ] 
> ] I think this way (use only one font for all scripts) is only
> ] acceptable on low-end application.
> 
> One font set.  The difference is that the fonts are only
> encoded in a document in a compound document architecture
> (what the Unicode Standard calls a complex document).

I didn't talk about complex document. What I want to say is that
using one *fontset* to display multi-script is good thing, and 
using one *font* to display multi-script (using Unicode and
XDrawString16) is bad thing. (see below.)

Missing XDrawString32 doesn't concern with sizeof wchar_t, IMHO.

> ] I'm sorry that I miss your point. What does "frequently
> ] non-intersecting" means ?
> 
> It means that characters from multiple languages will generally
> use only a single round-trip standard instead of several
> standards that resolve to the same code point in the Unicode
> character set.
> 
> For instance, If I have a multilingual document containing
> English and Japanese, there exists a standard JIS-208 such
> that I can display the document with a single font without
> compromising the characters of either language.

Certainly No. Japanese uses combination of JIS-X0201 (representing
English) and JIS-X0208 (represeting Japanese) mainly. Displaying
document by single font is used for only low-end application.
Middle/high-end application uses different fonts for English and
Japanese, because English character set has more fonts than Japanese
character set (making font which contains 256 characters is easier
than making font which contains 8000 characters).

So that...

> The "multinationalization argument" complains that I can't
> do this for Japanese and Chinese simultaneously.  What it
> neglects to note is that there is not a single character
> set standard that encodes Japanese and Chinese seperately,
> so there is no way to encode round-trip information in any
> case.

This argument is pointless.

> ] Even if you use 16 bit Unicode, wchar_t should be 32bit. Because 32
> ] bit wchar_t is right way to handle surrogated pairs of Unicode.
> 
> I don't understand why such pairs can't be handled by way
> of in-band tokens, just like shift-JIS or other escape-based
> compound document frameworks.

Because it is the way of multibyte-character (multi16bit-character in
this case). The benefit of wide-character is treating character as
single object. If you use multiple objects to represent one character,
Why do you choose wchar_t ?  Just use multibyte-character.

# We Japanese are using multibyte-character and wide-character
# more than 10 years, we know about it :-)

> I'm against using the ISO 10646 16 bit code page identifier to
> segregate on the basis of language.  8-(.

I think your opinion of this point is quite correct, if Unicode
doesn't have problems of design. But current Unicode specification 
has several problems which are not resolved without language
segregation. :-<
--
soda@sra.co.jp$B!!!!(BSoftware Research Associates, Inc.$B!!A>EDE/G7(B (Soda Noriyuki)