PDF Intro
Volume Number: 15
Issue Number: 9
Column Tag: Emerging Technologies
Portable Document Format: An Introduction for
Programmers
by Kas Thomas
Get to know the internals of Adobe's new document
interchange standard.
With the growing popularity of the World Wide Web (and the growing complexity of
computer-created documents), the need for an extensible, platform-independent
standard for document interchange has never been greater. More people need to share
more kinds of information than ever before.
But the growing complexity of computer-created documents has led to a kind of
free-for-all where data formats are concerned. Bridging the many font technologies,
imaging models, data types, and compression standards currently in use (while
maintaining a document's "look and feel" across operating systems, output devices, and
CPU architectures) would seem to be a fundamentally intractable problem. How can
one ever hope to reconcile so many competing "standards," while enforcing consistency
of appearance?
Rich Text Format was an early attempt to bring consistency to the digital page. But RTF
- conceived in the predawn of ARPANet - was not designed to accommodate non-text
data types. Hypertext Markup Language (HTML) addressed that need while introducing
the notion of hypertext search. But by abstracting font metrics out of the picture,
HTML's creators unwittingly fostered implementation-dependent page appearances - a
critical flaw in any system of information display that values consistency.
Adobe PostScript® was the first page description language to tackle the dual problems
of consistency and fidelity head-on. The key to its success was the abandonment of old
paradigms based on artificial distinctions between text and graphics. In the PostScript
world, everything is graphical - especially text.
PostScript embodied a procedural model for graphics, in which typefaces were simply
collections of curves. In PostScript, a page consisting of text and graphics was sent to a
printer as a series of lineto and arcto commands; the printer would interpret the
commands, create a display list, and rasterize the individual graphic elements to
recreate the page. Any graphic element that couldn't be described in vector terms -
like lineto or arcto - would simply be treated as a bitmap.
Limitations of PostScript®
As a vector-graphics language, PostScript was - and still is, in many ways - without
equal. But there are aspects of the language that make it less than ideal as the basis of a
universal document-interchange format. For example:
Lack of searchability: Most users of electronic documents expect to be able to search
text using keywords or traverse an index or table of contents, then jump quickly to
relevant sections. PostScript was not designed to allow hypertext links. Random access
to data is, in general, problematic because of the freeform way in which PostScript
files are organized.
Font substitution: Fonts are not always present in the file. Unsightly font substitutions
occur when needed fonts are not found on the target system.
Poor editability: PostScript files are not easily edited, annotated, or updated. When a
PostScript file needs to be changed, it is usually rewritten from scratch.
No support for multimedia data types: PostScript files do not accommodate QuickTime
movies, slideshows, sound bites, etc.
No support for restricted access: Security features (such as encryption, passwording,
and digital signatures) were not part of PostScript's design specification.
Large file size: Ironically, what was once one of PostScript's strengths (compact
representation of complex imagery) has been turned on its head as file size and
document complexity have grown hand-in-hand. PostScript files are now often
monstrously large.
Slow execution: Large files containing complex graphics can be slow to parse and would
lead to unacceptable latency in a viewer program.
Unpredictable errors: Variations in PostScript interpreters and in the quality of
PostScript code generated by applications ensure that end users will see errors -
errors that are sometimes not handled gracefully. One bad line of code in a large
PostScript file can - and often does - render the entire file unusable.
Adobe faced a critical decision in coming up with a new document standard: whether to
modify the PostScript language to suit the needs of universal document-sharing
(which would mean significantly complicating the language), or come up with an
entirely new page description language designed specifically for document interchange.