Multiscript Environment
Volume Number: 14
Issue Number: 9
Column Tag: Toolbox Techniques
Making Your Application Run Well in a Multiscript
Environment
by Nat McCully, Senior Software Engineer, Apple Computer, Inc.
Edited by the MacTech Editorial Staff
A few techniques that will give your application polish on
International systems
Many applications are sold in the U.S. and Europe without any major changes to their
codebases for a specific country or region. Sometimes this means that the non-U.S.
user runs into oddities of design or implementation that aren't quite right for his or
her language or region, because the code assumes a U.S.-centric design. Using "," for
the thousands separator in a number field, or "/" for a short date separator in a date
are two examples. These defects are not that serious, and in fact are avoidable if the
program uses the Mac OS International Utilities functions to extract region-specific
data from resources in the System, like the correct thousands separator or date
separator.
Once the product is to be distributed in a region which uses a different script system
from that in the U.S., things can get a bit more complicated. The U.S. uses the Roman
script system to display text and other data, as does most of Europe. Japan, however,
uses the Japanese script system to display text and other data, and therefore some
products that assume Roman script behavior will not function properly, or worse,
will not function at all.
This article will illustrate some techniques you can use in your application so it will
run properly in a multi-script environment. These techniques allow your code to be
easily localizable into any language in any script system, which will increase your
possible user base and therefore your product's revenue potential.
Mojibake ("moh-jee-bah-keh") is a Japanese word for when a run of text is displayed
in the wrong script system, and produces garbage characters that don't make any
sense. An example is below:
Text in correct (Japanese) font ------ Text in wrong (Roman) font
Figure 1. Mojibake Example.
This problem is one of the most common with applications that support multiple fonts.
What has happened is that each byte in the text stream above has not changed, but the
font used to render it has. This problem is a side-effect of the way in which the Mac OS
supports so many languages, by grouping languages and their fonts into script systems.
A run of the same raw text data will effectively change its meaning (or lose it
completely) depending on which font is used to display it.
This means that unless the application makes an effort to protect the user from
mojibake, it will likely happen, and this is a bad thing. The user may think that their
data has become corrupted, and may panic, telling all his or her friends how buggy
your software is.
It turns out that protecting the user from mojibake is not such a big deal. It can be
easily defined and scoped so you will always know what the 'right' thing is to do when
you are handling text in multiple fonts in a multiscript environment. For example,
you only need to worry about mojibake when:
• The user's system has more than one script system installed.
• There are characters in the Extended ASCII (hi-ASCII) range (> ASCII
127) in the text.
Further, there are four possible situations in which mojibake can occur:
A. When the user chooses a keyscript different from the script of the
current font and begins to type hi-ASCII characters.
B. When the user selects text and chooses a font from a different script than
the text's current font AND there are hi-ASCII characters in the selection.
C. When there is hi-ASCII in the text of the user interface of your
application and you default to drawing it in a font that can change (i.e. the
appFond) depending on the main script of the system.
D. When you are importing text without font information and must set the