PDFLib
Volume Number: 15
Issue Number: 12
Column Tag: Programming Techniques
Lasso 3.5
by Kas Thomas
A great freeware library makes adding PDF support
to an app easy
Adobe's Portable Document Format (PDF) has become a de facto standard for electronic
document interchange, based on its ability to deliver graphically rich, structured
content in a consistent manner across multiple operating environments. Almost every
large web site offers at least some PDF-based content, making the Acrobat Reader one
of the most popular downloads on the web. (Incredibly, Adobe claims to average some
100,000 downloads of the Reader from its web site per day.)
Because of its support for vector graphics, font embedding, hypertext links, and other
advanced features, PDF is a powerful, far-reaching document standard. But that also
means it's a relatively complex standard (for details, see the September 1999
MacTech) - and therefore far from trivial to support in an application.
From a programming standpoint, one can talk about two types of PDF support: support
for PDF reading (import), and support for PDF writing (export). As with TIFF,
QuickTime, and many other complex formats, it's much easier to provide write
support than read support, because a comprehensive PDF-read capability means
implementing the entire rather ponderous PDF specification (see
http://partners.adobe.com/asn/developer/PDFS/TN/PDFSPEC.PDF), whereas a
write-only facility may mean implementing only a tiny subset of the PDF spec - the
subset of particular interest to your application. For example, if your application
primarily outputs ASCII text, there is no need to implement graphics-embedding,
halftoning, transfer functions, etc., in order to support PDF output.
Adding a well-defined PDF-output capability to an application can be surprisingly
quick and easy, if you make full use of existing tools. For this article, I decided to add
PDF export capability to BBEdit (the popular text editor), with the aid of a
third-party freeware PDF library called PDFLib. Source code for the BBEdit plug-in
accompanies this article. (The complete CW Pro 5 project, including PDFLib and its
source files, can be found online at ftp://www.mactech.com.) But before we start
talking code, let's take a moment to review the basics of the PDF format, then look at
what kinds of development paths one might take to arrive at a PDF-export capability,
and what sorts of tools are currently available to make the programmer's life easier.
PDF Fundamentals
Adobe's Portable Document Format is a kind of gigantic, special-purpose markup
language, based largely on Postscript (the postfix-notation page description language)
but lacking Postscript's control-flow constructs. PDF is a sort of "unrolled" version
of Postscript, in which all graphics operations are inline (rather than relying on
loops) and therefore speedy. Lookups and indexing operations are likewise fast because
of PDF's extensive use of associative arrays (or "dictionaries," in Adobe parlance),
organized into treelike structures in which all nodes have forward and/or
back-pointers to other nodes; plus, every leaf (of every kind) has an entry in a giant
'xref' table, so that the offset of any object can be looked up instantly.
Pages are organized into sets of objects that describe a page's resources and content.
The objects are human-readable ASCII and look like:
4 0 obj
<
/Parent 1 0 R
/Resources 8 0 R
/MediaBox [0 0 612 792]
/Contents [5 0 R ]
>>
endobj
In this case, the top line tells us we're dealing with Object No. 4, revision zero. The
object is a dictionary object, as indicated by the double angle brackets, << and >>,
enclosing the object. The first entry in the dictionary is a label telling the type of
dictionary (in this case, a Page). The next label/value pair is a backpointer to the
parent of this object, namely Object No. 1. (A reference ending in 'R', such as 1 0 R, is
a pointer to an object.) The next entry tells where the page's resources can be found
(namely, in Object No. 8.) The MediaBox entry gives the page's dimensions, in points
(72 points to the inch); here, 612 by 792 means that we're dealing with a standard
U.S. Letter-size page (8.5 by 11 inches). The final entry, in the above example,
shows where the page's Contents (probably a stream object) can be found, namely in