May 00 Online
Volume Number: 16
Issue Number: 5
Column Tag: MacTech Online
May 00 Online
By Jeff Clites < online@mactech.com>
A few months ago, we briefly touched on the subject of XML, the biggest hype magnet
since Java. A quick glance at any IT-related publication or website will give you a
pretty good idea of the degree of interest there is in anything related to XML, and
companies are churning out press releases about their plans to utilize XML faster than
the press can cover them. XML isn't the Next Big Thing, it's the Current Big Thing. No
matter where you stand on the hype-versus-substance issue, as a developer you'll
need a working understanding of the technology to meet the demands of new standards
and new modes of application interoperability. Beneath the static there is great
potential in XML, and everyone will benefit from the attention it is drawing to issues
surrounding communication and standardization. This month we'll cover a few of the
resources you can turn to when working with this rapidly growing technology.
Documentation
Although there are many choices available, it is difficult to find the perfect printed
reference; many of the obvious choices turn out not to be very good, and the field is
evolving so rapidly that even the high-quality books quickly become out-of-date. For
the moment, though, there are a couple of winners. The XML Pocket Reference, from
O'Reilly and Associates, serves as both a beginner's introduction and a quick reference.
For an extensive survey of current (and developing) technologies and APIs, try
Professional XML from Wrox Press.
XML Pocket Reference
<http://www.oreilly.com/catalog/xmlpr/>
Professional XML
<http://www.wrox.com/Consumer/Store/Details.asp?ISBN=1861003110>
There is also no end to web-based coverage. An indispensable source is the XML FAQ,
which covers a wide range of topics, and is well written. At the other end of the
difficulty spectrum is the XML specification itself. Although you'll want to avoid it as
long as possible, at some point you'll need to go back to the source to get the definitive
word on something. It's a difficult read, but there is a cleverly-constructed annotated
version available, which makes deciphering the formal language a bit easier. This
latter reference is provided by XML.com, which is an excellent source of references
and ongoing coverage of the industry, with timely articles on emerging technologies.
Next, the XML Cover Pages are an exhaustive and neutral reference for all things XML
(and SGML) related. Also of general interest is the original "mission statement" of the
XML standard, which sets forth the design goals for its development. Finally, those
planning on developing or working with XML parsers should take a look at the Lark
parser, developed by Tim Bray, co-editor of the XML specification. It does not appear
to be under active development, but it does provide an interesting case study in parser
development in Java.
The XML FAQ
<http://www.ucc.ie/xml/>
The Annotated XML Specification
<http://www.xml.com/pub/axml/axmlintro.html>
XML.com
<http://www.xml.com/pub>
The XML Cover Pages
<http://www.oasis-open.org/cover/xml.html>
Design Principles for XML
<http://www.textuality.com/sgml-erb/dd-1996-0001.html>
An Introduction to XML Processing with Lark and Larval
<http://www.textuality.com/Lark/>
Programming XML
XML was designed to be easy to parse, and XML documents are often characterized as
self-describing, but if you need to develop an XML-processing application you'll
quickly find out that this is nonetheless not a trivial job. Fortunately, there are a
number of parsers out there to help you with this task, and simple XML documents can
in fact be created and parsed simply. One of the first XML parsers, and perhaps the
most widely used, is James Clark's expat. It's a non-validating parser written in C,
and recently an Objective C wrapper has been created for it, so it should be
straightforward to use it from within Cocoa applications. (I believe that expat is also
being used internally within Mac OS X, but it isn't clear what sort of API Apple will
provide to access it.) Moving forward, it is likely that the Xerces parsers, which are
part of the Apache XML project, will be widely used. They are validating parsers, and
are available in Java, C++, and Perl. Much of their code originated from IBM's
alphaWorks project, and IBM continues to provide its own versions, XML4J and
XML4C for the Java and C++ versions respectively, which combine Xerces with their
own Unicode classes, providing support for an expanded range of encodings. Apple is in
fact using XML4J in their recently-released version 4.5 of WebObjects, and again it
isn't clear to what extent the parser will be accessible from other parts of Apple's
frameworks. Xerces-C contains a fair amount of code which must be customized when
porting it to new platforms, but a classic Mac OS port has been developed, and a BSD
version is in development which is likely to compile under Mac OS X. Also of note is the
gnome-xml parser, which is under active development. It falls under the umbrella of
the Linux Gnome project, but it is independent of the rest of Gnome and should be
portable to other environments and platforms. (For example, it is known to work
under Windows.) It is also a validating parser, but appears to be simpler than Xerces
and may be worth a look if you need to access a validating parser from C and related
languages. The gnome-xml web page also has links to several other articles to get you
started, and in particular you should take a look at the article on IBM's
developerWorks site.
expat - XML Parser Toolkit
<http://www.jclark.com/xml/expat.html>
Objective C wrapper for the expat XML Parser
<http://softrak.stepwise.com/Apps/WebObjects/Softrak.woa/1/wa/displayPackage?p
ackage=815&os=10>
The Apache XML Project
<http://xml.apache.org/>
IBM's XML Parser for Java (XML4J)
<http://www.alphaworks.ibm.com/tech/xml4j>
IBM's XML for C++ parser (XML4C)
<http://www.alphaworks.ibm.com/tech/xml4c>
The XML library for Gnome
<http://xmlsoft.org/>
Making application programming easy with GNOME libraries, Part 3
<http://www-4.ibm.com/software/developer/library/gnome3/>
If you're experimenting with XML for the first time, or you need to process XML in a
web-related context, you should check out the numerous Perl modules available. A good
place to start is the libxml-perl package. Of particular interest are the XML-Grove
modules, which let you manipulate an XML document as a tree of objects and access
various parts of it using a path-like syntax. This is analogous to the DOM and XPath
APIs, and the grove interface will likely become obsolete after these have matured, but
in the short term it provides a convenient and powerful approach to XML processing.
There are a number of other APIs available for working with XML documents from
Perl, including DOM-based and SAX-based parser APIs (the former allows you to
access XML documents as a tree of objects, and the latter as a stream of events), as
well as support for various approaches to XML querying. IBM has two excellent
articles which are not to be missed; one gives a brief run-through of all of the
XML-related Perl tools available, and the other gives a detailed tutorial on
manipulating XML using Perl, including conversion of XML to HTML and XML-driven
database access.
libxml-perl
<http://bitsko.slc.ut.us/>
Essential tools and libraries for using XML with Perl
<http://www-4.ibm.com/software/developer/library/perl-xml-toolkit/>
Manipulating XML documents with Perl and other scripting languages
<http://www-4.ibm.com/software/developer/library/xml-perl/>
When you grow weary of mucking around with the innards of the future web, cruise on
over to the MacTech Online web pages at <http://www.mactech.com/online/>, and let
you browser worry about the parsing for a while.