December 93 - INTERNATIONAL NUMBER FORMATTING
INTERNATIONAL NUMBER FORMATTING
NORBERT LINDENBERG
Have you ever wondered how to get your program to display numbers in a way that
satisfies Macintosh users all around the world? This article tells you what users
expect and shows you how to use the Macintosh Toolbox to correctly format numbers,
taking the needs of both your program and the user into account. It also shows how to
interpret numbers entered by the user.
When you develop an application, you usually have some opinion about the format in
which numbers should be presented to the user. However, number formatting
standards differ from country to country (and sometimes even within a country), and
users also may have their own ideas on the subject. Macintosh system software
provides support to format numbers in ways that accommodate both the needs of your
application and local standards, and -- starting with System 7.1 -- also lets the user
control some aspects of number formatting using the Numbers control panel.
This article shows two different ways to format numbers: using a default format for
simple number display, and following the user's specification for more sophisticated
number display. It also shows how to interpret numeric input correctly. This issue's
CD contains an application called Numbers Test that lets you try out these two different
methods of formatting numbers and enter numbers for interpretation. The CD also
contains BuildNumbers, an MPW script that builds an MPW tool that's functionally
equivalent to the application.
WHAT USERS EXPECT
Users expect to see numbers in a format that makes sense to them. This challenges the
programmer to accommodate the variations on number formatting that occur around
the world.
The most common system for writing numbers is the decimal system, where numbers
are formed from ten different numerals, with the position of each digit within a
number defining a multiplier for it: 123 = 1*100 + 2*10 + 3*1. However, there
are many local variations on this scheme, and there are some writing systems that
prefer a different style of writing numbers, in which case decimal numbers may or
may not be an acceptable alternative. Systems besides the decimal system that users
may require include Roman numerals (used in many languages to number topics or
title pages) and hexadecimal numbers (familiar to everybody who's ever dropped into
MacsBug), as well as the Japanese and Chinese systems. For details on how these
number formatting systems differ from one another, see "Number Formatting
Variations.
Computers may complicate matters even more by providing multiple character codes
for the same digit. For example, the Macintosh Japanese character set provides both
1-byte and 2-byte encodings of the Latin characters (which are called "Romaji" in
Japanese). They can easily be distinguished on the screen: the 1-byte versions are
narrower than the 2-byte versions, which take up the same widthas Kanji characters.
For interpretation as numbers, however, these different encodings should be
considered equivalent.
Another example is the Macintosh Arabic character set, which defines a set of Arabic
digits with right-to-left orientation in addition to the ASCII digits, which have
left-to-right orientation and are usually displayed with Arabic glyphs when an Arabic
font has been chosen. The right-to-left digits are intended only for text that doesn't
have a numeric meaning, such as software version strings and part numbers, and are
needed to obtain proper line layout in these cases. However, users may not be aware of
this intention and may try to enter numbers using these digits. Later we'll discuss how
to deal with this.
WHAT YOUR APPLICATION NEEDS
Depending on how sophisticated your application is with regard to numbers, you'll
need to support variations on number formatting in three different situations: simple
number display, number display in a user-specified format, and numeric input.
For simple number display, your application needs to show a given number in a default
format that makes sense to the user. This kind of formatting may suffice for many
applications and is commonly used for dialogs.
For other number-display situations, your application might need to format numbers
according to the user's specification. The user might specify which representation to
use for the number (for example, decimal or traditional Chinese; Thai, Arabic, or
Latin glyphs), the number of digits after the decimal separator, how to indicate
negative numbers, whether to use thousands separators, which currency symbol to
use, and where to place it. This kind of formatting is needed, for example, for
spreadsheets, databases, and page layout applications.
Numeric input is needed in almost any application -- for example, to specify the width
of a page, the number of a page to jump to, or the size of a font. Ideally, your
application should be able to interpret a numeric string in any format that might make
sense to the user, independent of the display formats you use.
NUMBER FORMATTING VARIATIONS
Local variations on the decimal system include variations on the shapes of the digits,
representation of negative numbers, the decimal separator, and the thousands
separator.
• The shapes of the digits: The glyphs used with the Latin writing system
differ from those used with the Arabic writing system, and several other
writing systems come with their own glyphs.
Latin
Arabic
Thai
• Representation of negative numbers: The minus sign can be used before or
after the number, or the number can be parenthesized.
• The decimal separator: Either a period or a comma can be used to mark off
the integer part of the number from the fractional part.
• The thousands separator: A space, a comma, a period, or some other
character can be used to mark off the thousands place from the hundreds place,
the millions place from the hundred thousands place, and so on. Sometimes the
thousands separator isn't used at all.
Many other variations exist, especially for noninteger numbers. Here's a sample of
local variations on how one negative number is represented in the decimal system:
Arabic
French -1 234,56
German -1.234,56 Greek (1 234.56)
Japanese (1,234.56)
Swiss French -1'234.56
Thai
U.S. -1,234.56
In the Roman system, numbers are formed from letter digits representing the
numbers shown below.
M 1000
D 500
C 100
L 50
X 10
V 5
I 1
Originally the digits of the number were simply added up to arrive at the value of the
number, and digits were sorted in decreasing order within the number (so 9 = VIIII).
Later a convention was added that positioning one of the digits C, X, or I before a
higher-valued digit means that its value is to be subtracted instead of added (so 9 =IX).
The Japanese and Chinese systems represent numbers in various ways. In horizontal
writing, the decimal system with Latin glyphs is commonly used. Ten thousands
separators were once used instead of thousands separators and are still used in some
very traditional quarters, but accountants in Japan now use thousands separators
instead. In the traditional vertical writing preferred by native speakers, however,
Chinese characters are used without separators. A mapping of decimal numbers to
Chinese digits is acceptable; however, a direct representation of the numbers as they
are spoken is preferred. The number 45000, for example, is represented in the
decimal style on the left and in the traditional style on the right:
4
4
5
ten thousand
0
5
0
thousand
0
WHAT MACINTOSH SYSTEM SOFTWARE PROVIDES
Macintosh system software supports number formatting with international resources,
the Numbers control panel (in System 7.1), and the Text Utilities routines.
Unfortunately, the functionality provided doesn't cover all the needs just described --
it's limited to decimal numbers and a maximum of two encodings per script. This
means that, for instance, Chinese vertical numbers aren't supported; with the advent
of QuickDraw GX, which supports vertical text, this problem is becoming more urgent.
There are some interesting details you'll have to understand to make the best use of the
functionality provided.
INTERNATIONAL RESOURCES
International resources of two types, 'itl0' and 'itl4', provide data that helps in
formatting numbers.
• Resources of type 'itl0' contain separator symbols (decimal separator and
thousands separator) and information about a simple default format. These
resources allow for 1-byte characters only and don't support more
sophisticated layout.
• Resources of type 'itl4' contain a number parts table used by the Text
Utilities routines to interpret format specification strings entered or selected
by the user. They also contain a table of alternate digits that can be used
instead of the default ASCII digits and that may be 2-byte characters. If there
are no alternate digits for the script, the ASCII digits are repeated in this
table. A system file can contain multiple resources of either type. Each
regional version of system software comes with a default resource of each
type, as well as the U.S. versions of the resources; more resources can be
added.
If multiple scripts are installed on one machine, each script has at least one resource
of each type and designates one resource of each type as the default for the script. The
default resources for the system script (the script that supports the language your
system is localized for) define the systemwide default. If GetIntlResource (IUGetIntl)
is used to access a resource, the script whose resources are returned depends on the
font in the current graphics port and the settings of the international resources
selection flag. To avoid surprises, it's usually better to ask for resources of specific
scripts; the InitializeDefaultNumberSeparators routine, discussed later, does this.
All Macintosh scripts support the use of the ASCII digits ($30-$39), and some
scripts provide an additional set of digits in an alternate numeral table. The Japanese
'itl4' resource contains the 2-byte Romaji digits; the Arabic 'itl4' resource, the
right-to-left digits; and the Thai 'itl4' resource, the Thai digits. Because only one
alternate numeral table is allowed per 'itl4' resource, you won't find in the Japanese
'itl4' resource the Chinese numerals used in the Japanese script. Unfortunately, not
all scripts that have multiple sets of digits define them in the 'itl4' resource; for
instance, the Chinese versions of System 7.1 don't make use of the alternate numeral
table but only support the ASCII digits.
THE NUMBERS CONTROL PANEL
The Numbers control panel (in System 7.1) lets users select the default number
format and define customized decimal and thousands separators, as well as the
currency symbol. In earlier systems, the International control panel (which was
shipped only with certain localized versions of system software) allowed the user to
select the default number format but didn't provide for customization. (See Figure
1.)
Figure 1The International and Numbers Control Panels
To correctly access the international resources and interpret their contents, it helps
to know how the control panels affect the resources. The behavior of the control panels
has changed significantly from system software versions 7.0 and 7.0.1 to version 7.1.
The International control panel in versions 7.0 and 7.0.1 lets the user select only the
language whose number formatting rules apply; it does not allow modification of the
rules. Selecting a language makes the corresponding region's 'itl0' resource the default
resource for its script, so that all its features take effect. The 'itl4' resources are not
affected.
The Numbers control panel in System 7.1 lets the user select a predefined regional
version or define a custom version of the number format. The first time the control
panel is opened after installing System 7.1, it creates a new 'itl0' resource in the
System file based on the predefined default 'itl0' resource of this version of system
software and makes this new resource the default for the system script. From then on
it keeps the user's format definition in this personalized 'itl0' resource, whether it's
selected from predefined formats or defined as a custom format. When the user selects
a different regional version, all items of that region's 'itl0' resource that are
represented in the control panel are copied into the personalized 'itl0' resource; other
features defined in the 'itl0' resource are ignored. This means that the decision, for
example, whether to show negative numbers with a minus sign or in parentheses is not
affected by the selection. The default selection of the 'itl4' resource isn't changed;
however, the default 'itl4' resource is modified to use the personalized 'itl0'
resource's decimal and thousands separators in its number parts table.
There's one problem with the Numbers control panel that you have to be aware of: it
doesn't impose any constraints on the selections for the decimal and thousands
separators, other than not allowing the user to enter 2-byte characters. The user can,
for example, select a digit, the minus sign, no character at all, or a character that
conflicts with the inner workings of the Text Utilities routines for interpreting
format specifications. To make sure that your application functions correctly, you
have to check whether the separators make sense before using them.
The sample code discussed in this article assumes that you don't check for changes of
the resources while your application is running, so it gets all necessary information
at launch time and caches it. This way, changes made with the control panel will not
be immediately reflected in your application, but you also avoid the problem of
inconsistent updates. This problem can arise if you always use the most current
information, and the user changes, say, the decimal separator while your application
is displaying numbers in a window; in this case, it could happen that after redrawing a
part of your window you display one decimal separator in the updated region and
another one in the rest of the window.