March 97 - Gearing Up for Asia With the Text Services Manager and TSMTE
Gearing Up for Asia With the Text Services
Manager and TSMTE
Tague Griffith
Are you eyeing Asian markets for your application? If so, the
smartest thing you can do to gear up is to enlist the aid of the Text
Services Manager (TSM), introduced with System 7.1 to help
applications communicate with utilities that provide text
services. Making your application TSM-aware, an easy matter if
you're using TextEdit and TSMTE, will enable it to use the services
of utilities designed to handle text input in Chinese, Japanese, and
Korean. Your application will also be poised to take advantage of
the wide variety of text services that eventually will be supported
by the Text Services Manager.
Localizing your application for Asian markets, or for Asian language-speaking
customers in the United States, may seem like a daunting task to you, but take heart:
the Text Services Manager (TSM) makes one aspect of localization, handling keyboard
input, easier than you might imagine. Part of the WorldScript technology in the
Macintosh Toolbox, the Text Services Manager enables applications and text service
utilities to communicate without knowing anything about each other's internal
structures or identities. When you make your application TSM-aware, you make it
possible for your Asian language-speaking customers to use your application in
concert with a utility program that does the necessary conversion of keyboard input.
This article shows you how to modify your TextEdit-based application to make it
TSM-aware -- that is, so that it makes the appropriate calls to the Text Services
Manager. It doesn't take a lot of modifications, as you'll see from the sample
application (called InlineInputSample) that accompanies this article on this issue's CD
and develop's Web site. Our application uses TSMTE, an extension that's shipped with
the system), which extends TextEdit to handle the details of TSM awareness with
minimal effort on the part of application writers. Using TSMTE should be sufficient
for most applications; however, for intensive text-processing applications or
applications using a different text-editing engine, you may need to handle all TSM
processing yourself.
Before we look at the changes you need to make to your application to make it
TSM-aware, I'll briefly explain how keyboard input works for Chinese, Japanese, and
Korean. If you'd like to read more about common problems of localization, see "Writing
Localizable Applications" in develop Issue 14. For details on the Text Services
Manager, consult Chapter 7 of Inside Macintosh: Text.
ASIAN LANGUAGES AND KEYBOARD INPUT
As you can guess, supporting keyboard input for Asian languages isn't the same as
handling English, because they're written in different scripts. A script is a writing
system that can be used to represent one or more human languages.
English and other European languages are written in the Roman script, which is an
alphabetic script. In alphabetic scripts, the various characters of the script are
combined in different ways to form words. Alphabetic scripts have a small repertoire
of characters compared to other types of writing systems. It's a simple matter to
represent all the characters in an alphabetic script on a keyboard. Because there are
fewer than 256 characters in such scripts, it takes only one byte to uniquely identify
each character, so these scripts are known as 1-byte scripts.
Asian languages are quite different, being written in scripts that include ideographic
characters borrowed from ancient China. An ideograph is a symbolic character that
usually represents a single concept, action, or thing. Figure 1 shows some examples of
Japanese and simplified Chinese ideographs. Because each character represents a
single concept, there are -- by necessity -- many, many more characters than in the
Roman script. Most literate Chinese speakers know around 5000 ideographs, and a
literate Japanese knows around 3000 ideographs. Two bytes are required to uniquely
identify each character in an ideographic script, and thus these scripts are known as
2-byte scripts. Chinese, Japanese, and Korean also incorporate alternative script
systems based on syllabic or phonetic characters (characters that represent certain
sounds).
Figure 1. Some Japanese and Chinese ideographs and their English translations
INPUT METHODS
How is it possible for users of 2-byte script systems to get by with a standard
Macintosh keyboard? Obviously, they can't simply press the key corresponding to the
one character they want out of 3000 or 5000 characters. Enter the text service
utility known as an input method or a front-end processor (FEP). An input method
allows users to type phonetic or syllabic characters on a standard keyboard and
automatically converts what they type into ideographic representations.
For Chinese speakers, the appropriate input method converts keyboard input from
Pinyin (Roman) or Zhuyinfuhao (phonetic, also known colloquially as Bopomofo) to
ideographic Hanzi. For Japanese speakers, the input method converts input from
phonetic Katakana or Hiragana into ideographic Kanji, as illustrated by the example in
Figure 2. The input method for Korean speakers converts phonetic Jamo into
nonideographic Hangul (complex clusters of Jamo).
Figure 2. The same sentence as entered in Hiragana and as converted to Kanji
Apple currently ships four 2-byte keyboard input methods: Kotoeri (Japanese),
Power Input Method (Korean), Traditional Chinese (as used in Taiwan), and
Simplified Chinese (as used in the People's Republic of China). The same input
methods are shipped with the Apple Language Kits, and third-party input methods are
also available.
Regardless of the language, all input methods have a similar user interface. When
more than one script is installed on the Mac OS, as is the case for localized systems
since all systems have the Roman script installed, the Keyboard menu becomes
available in the menu bar. Each available keyboard layout and input method is listed in
the Keyboard menu; the icon for the active keyboard layout or input method appears as
the menu's title in the menu bar. Figure 3 shows a Keyboard menu displaying items for
Apple's Simplified Chinese and Kotoeri (Japanese) input methods, as well as keyboard
layouts from some other script systems. The Simplified Chinese input method is
active; it's checked in the menu and its icon appears highlighted in the menu bar. The
pencil icon in the menu bar is displayed only when an input method is active (in other
words, not when the user is typing in English or another language that doesn't require
an input method); it's the title for the menu belonging to that input method. Some input
methods use a different icon, but it appears in the same place as the pencil icon.
Figure 3. Input method icons in the Keyboard menu and the menu bar
BOTTOMLINE VS. INLINE INPUT
When the user begins typing, the raw text appears on the screen as entered, either in a
floating input window that's usually displayed in the lower portion of the screen or in
the application window where the text is intended to appear. The first style of text
entry is known as bottomline input, while the second is called inline input (see Figure
4). Applications that aren't TSM-aware can make indirect use of the Text Services
Manager's floating window service to enable bottomline input (as explained on page
7-13 of Inside Macintosh: Text), but users generally prefer inline input, which only
TSM-aware applications can offer. TSM-aware applications can also offer bottomline
input, which users may prefer if the size of the text displayed in the document makes
reading the characters difficult.
Bottomline input
Inline input
Figure 4. Bottomline vs. inline input
In the case of inline input, the just-entered text appears in what is known as the
active input area or inline hole. Text in the active input area or the floating input
window is underlined in gray or highlighted in some other manner, depending on the
application.
With either bottomline or inline input, the raw text is converted from its phonetic or
syllabic representation to ideographic or complex syllabic characters, and the gray
underline (if there is one) turns to black or changes in some other manner determined
by the application, when the user gives a signal such as pressing the space bar after
entering a sequence of characters. There may be more than one possible reading of a
given character sequence, in which case the input method will display a list of
candidates in a candidate window, as shown in Figure 5. When the user selects one of
the candidate readings, the raw text is converted.
Figure 5. Selecting a conversion option for inline input in a candidate window
The user then confirms the converted text, generally by pressing Return. (In Korean,
conversion happens continuously and automatically, and the text is confirmed when the
user presses either Return or the space bar.) In the case of bottomline input, the
confirmed text is flushed from the input window and sent to the application as
key-down events. For inline input, the confirmed text is copied into the application's
text buffer (as shown in Figure 5) and the active input area is closed. When the user
begins typing again, the underline beneath the confirmed text disappears entirely and a
new active input area opens.
Before you start feeling overwhelmed by all this, realize that most of the user
interface elements I've just described are handled by the input method or TSMTE and
not your application. The input method takes all the keystrokes and processes them;
your application simply draws the input method's text buffer in the application
window. All you need to do to get the benefit of this kind of text service is to make a few
modifications to your application. Once your application is TSM-aware, you can work
with any input method regardless of language and thus offer your Asian
language-speaking customers the convenience of inline input.
MAKING YOUR APPLICATION TSM-AWARE
Making your application TSM-aware is a matter of adding calls to send information to
input methods by way of the Text Services Manager. Most of the popular text-editing
engines for the Mac OS other than TextEdit are already TSM-aware. One of these,
WASTE (the WorldScript-Aware Styled Text Engine, developed by Marco Piovanelli),
makes all but four of the necessary calls: InitTSMAwareApplication,
CloseTSMAwareApplication, TSMEvent, and SetTSMCursor. These calls need to be made
by the application. Optionally, a WASTE-based application can install pre- and
post-TSM-update callback routines. If you use WASTE for your text-editing engine,
most of the techniques described in this section apply. The WASTE source code is
available online at many popular Macintosh ftp sites; I highly recommend looking at it
for examples of how to handle the TSM protocol directly.
Using TSMTE, as our application InlineInputSample does, you can make your
TextEdit-based application TSM-aware with a few modifications to your
event-handling, cursor-handling, window, and menu code. Most of the changes are
quite simple and limited to particular subroutines of the application, as demonstrated
by InlineInputSample. Our application is a version of TESample, a program written by
Apple's Developer Technical Support group and provided with many development
environments as part of the example code (it's also on this issue's CD). The code that
makes our version of the program TSM-aware is conditionalized with qInline
conditionals so that you can easily pick it out. You might want to take a look at that code
as you read this section.
To see the full capabilities of the InlineInputSample application, you
need a Macintosh with System 7.1 or later localized for Chinese, Japanese, or
Korean, or with one or more of the Asian language kits installed.*
TESTING FOR THE TEXT SERVICES MANAGER AND TSMTE
Before using the Text Services Manager and TSMTE, we need to check and see if they're
available. The Text Services Manager is available on all versions of the system after
7.1. However, TSMTE ships only with localized versions of the system and with the
Apple Language Kits for Chinese, Japanese, and Korean. The support for inline input
discussed in this article will be active only while you're using one of these languages.
Listing 1 shows the code we use to check for availability of the Text Services Manager
and TSMTE. If we were writing our own protocol handlers, we would eliminate the
gestaltTSMTEAttr test.
______________________________
Listing 1. Testing for TSM and TSMTE availability
static void CheckForTextServices(void)
{
long gestaltResponse;
gHasTextServices = false; // unless proven otherwise
gHasTSMTE = false; // unless proven otherwise

if (TrapAvailable(_Gestalt)) {
if ((Gestalt(gestaltTSMgrVersion, &gestaltResponse) == noErr)
&& (gestaltResponse >= 1)) {
gHasTextServices = true;
if (Gestalt(gestaltTSMTEAttr, &gestaltResponse) == noErr)
gHasTSMTE = BTst(gestaltResponse, gestaltTSMTEPresent);
}
}
}
______________________________
The selector gestaltTSMgrVersion returns the version number of the Text Services
Manager if it's installed. You should test to make sure that the version is greater than
or equal to 1, the current version of the Text Services Manager. This will allow your
application to work with future TSM versions as well.
INITIALIZING THE APPLICATION
Once we've established that the Text Services Manager and TSMTE are available, we
need to extend our Toolbox initialization sequence to initialize the Text Services
Manager. This is done by calling InitTSMAwareApplication. We also want to store the
current state of the Script Manager's smFontForce variable (the font force flag) and
set it to false while our application is running. This flag ensures the correct
text-handling behavior in applications that don't use the Script Manager. Since we're
using the Script Manager to support text in different languages, we should turn this
off, as shown in Listing 2.
______________________________
Listing 2. Initializing as a TSM-aware application
if (!(gHasTSMTE && InitTSMAwareApplication() == noErr)) {
// If this happens, just move on without text services.
gHasTextServices = false;
gHasTSMTE = false;
}
// Get global font force flag; make sure it's off whenever we run.
// Do this even if text services don't exist.
gSavedFontForce = GetScriptManagerVariable(smFontForce);
(void) SetScriptManagerVariable(smFontForce, 0);
______________________________
Of course, since we do this work at initialization, we need to clean up when our
application quits. In our termination routine, we restore the value of the font force
flag and call CloseTSMAwareApplication. The font force flag also needs to be restored
anytime control passes from the application to the system when we're dealing with
fonts and such; it particularly should be restored in the case of a suspend event.
EXTENDING THE DOCUMENT STRUCTURE
Now we need to extend our document record to store the additional data structures
related to TSM awareness. Our application's original DocumentRecord data structure is
extended to include two additional fields, as follows:
typedef struct {
WindowRecord docWindow;
TEHandle docTE;
ControlHandle docVScroll;
ControlHandle docHScroll;
TEClickLoopUPP docClick;
Boolean modified;
TSMTERecHandle docTSMTERecHandle; // added
TSMDocumentID docTSMDoc; // added