Support Multibyte Text
Volume Number: 14
Issue Number: 1
Column Tag: Toolbox Techniques
Supporting Multi-byte Text in Your Application
by Nat McCully, Senior Software Engineer, Claris Corporation
How to make most efficient use of the Script Manager and
International APIs in your text engine
Introduction
You have developed the next greatest widget or application and you want to distribute it
on the Net. Your application has a text engine, maybe simple TextEdit, or one of the
more sophisticated engines available for license, or even one you wrote yourself. One
day, you get an e-mail from someone in Japan:
Dear Mr. McCully,
My name is Takeshi Yamamoto and I use your program, MagicBook, everyday. But, I
have a problem using Japanese characters in it. When I hit the delete key, I get weird
garbage characters. My friends and I wish to use both Japanese and English in your
program, but it does not work properly. Please fix it!
T. Yamamoto
Suddenly, there are people on the other side of the world who want to use your
application in their language, and you are faced with a dilemma. You have no first-hand
knowledge of the language itself, but you may be somewhat familiar with the
Macintosh's ability to handle multiple languages in a single document with ease.
Simply cracking open Inside Macintosh: Text seems daunting. How will these new
routines affect the performance of your program? Will you introduce unwanted
instability and anger your existing user base? Where can you find information on how
to use which routines best, not just a description of what each routine does?
This article attempts to address some of these issues, and in general familiarize you
with some of the best things that make the Mac an excellent international computing
environment. Intelligent use of the Macintosh's international routines, WorldScript,
and the other managers in the Toolbox can be the difference between a US-only
application and a truly "world-ready" tool that any user, anywhere, can utilize as soon
as they download it to their hard disk. Although this paper deals primarily with
Japanese language issues, the concepts outlined herein can be used with any
multi-lingual environment.
What is WorldScript?
WorldScript is the set of patches to the system that enables the correct display and
measurement of multi-lingual text. Over time, many of these patches have been rolled
in to the base system software, but even in MacOS 7.6.1, you will find a set of
WorldScript extensions in the Extensions folder when you install one of the Language
Kits available from Apple. The concepts and code snippets in this paper will work
equally well on, for example, the Japanese localized Mac OS, or on a standard U.S.
system with the Japanese Language Kit (JLK). A good source of localized system
software and Language Kits is the Apple Developer Mailing CD-ROM, available from
the Apple Developer Catalog. WorldScript is one of the Apple-only technologies that
makes multi-lingual computing possible in a far easier way than the other guys. When
it comes to having Chinese, Korean and Japanese all in the same document,
WorldScript, on Mac OS, is the only thing out there.
Multi-byte Text on the Mac
OK, let's get to the meat, you say. How do you make your text engine handle two-byte
characters? Well, before giving you a bunch of code, let's explain how the Mac handles
two-byte text.
What is a Script?
Each language that the Mac supports is grouped into categories called "scripts." For
example, English and the other Roman letter-based languages like French and German
all belong to the Roman script. Japanese belongs to the Japanese script. Character
glyphs in the Roman script are each represented by a single-byte character code.