File, Disk Servers
Volume Number: 3
Issue Number: 4
Column Tag: Networking Issues
File Servers versus Disk Servers
By Tim Maroney, Software Designer, Centram Systems West
Your kindly editor, Dave Smith, has invited us to clear up some common
misperceptions about TOPS, and generally dispel the fog of confusion that surrounds
the whole area of networked file systems. A widely distributed Mac magazine recently
ran a short piece on the difference between file servers and disk servers that was
almost completely wrong, and at trade shows one often hears sales people giving out
incorrect information. This article should help people to navigate through the
dimly-lit coral reefs of networking.
There are three main approaches to sharing files over a network: file transfer,
disk service, and file service. (There is also disk transfer, known to initiates as "the
Frisbee method".)
The most venerable approach is file transfer. Most programmers have used file
transfer over phone lines; it's much the same over a network. Instead of dialing a
phone number, one types in or selects a machine name, but the basic sequence of
operations is the same. The user asks to send or receive a file to or from a remote
system. Cooperating software on both machines breaks the file up into small packets of
data and reliably transfers and acknowledges the packets over the serial line or
network. Each packet is sent with a "checksum" or "cyclic redundancy check" value
derived by performing a sequence of arithmetic operations on the bytes in the packet.
If the receiving machine finds that a packet doesn't match its checksum or CRC, it asks
for the packet to be sent again. In this way, the entire file is sent with guaranteed
correctness. On serial lines, the protocol is likely to be Kermit or XMODEM; on a
network, it is likely to be FTP (File Transfer Protocol).
An open secret of networking is that there is no such thing as a perfect guarantee
of correctness. It is possible for a packet to be completely garbaged by line noise, but
to coincidentally fall together into an acceptable packet with a valid checksum. It is
also possible, as Gamow pointed out, for all the molecules of air in a room to randomly
wind up in the same corner of the room and leave any people in the room gasping in a
vacuum. It isn't particularly likely, and neither is a network or line error that yields
a valid checksum or CRC value.
File transfer is adequate for many applications, particularly keeping libraries of
software or literature which people want to download to their machines. However,
there are a number of file sharing applications which require a more dynamic
approach. For instance, if you have a distributed database, you don't want to have to
download it to your machine, make changes, and then upload it back to the original
machine. In a multiple-engineer programming product, it might be desirable to keep
the sources on a central machine and have everyone work from the same copies, while
actually using their own microcomputers. Sometimes you have only one hard disk but
three people need to have a few megabytes each at the same time. And so forth. For
these kinds of applications, disk service or file service is more suitable.
Disk service and file service look similar to a human user, but the
implementations are different in significant ways. In both cases, though, the idea is to
make disk storage that is connected to another machine seem to be directly connected to
your machine. On the Mac, that means (from a user's perspective) that a remote disk
volume appears with a disk icon in the Finder, and can also be seen inside the Standard
File Package, so that the files on the remote disk can be used just like local files. The
term "transparency" is usually used to refer to this kind of file access; the fact that
the file actually resides on another system is transparent (invisible) to software. In
other words, transparency of disk or file service means that old programs still work,
without having to put out new versions.
You can see that file transfer is actually a functional subset of disk service or file
service. Network file transfer can be done in the Finder on the Mac using TOPS or
MacServe, without requiring a special transfer utility; all you have to do is drag the
remote file to a local (or even another remote) volume or folder.
In just about every operating system on the planet, there are two levels of file
access. The programmer uses high-level file operations, like open, read, write, close,
and so forth. High-level file operations are translated by the operating system into
low-level disk operations involving physical disk block reads and writes. Low-level
operations are usually structured as calls to any of several lookalike disk drivers,
pieces of software in the OS that deal with the details of communicating with the disk
controller. An operating system is associated with a particular disk format, which is
the same from disk to disk. That is, regardless of whether you have a DataFrame or an
HD20 connected to your Mac Plus, the first two blocks on the disk contain system
startup information, the third volume information, and so forth, even though two
different disk drivers are used to talk to the disks, and the disks represent their blocks
differently at the physical level.
Disk service intercepts file operations at the level of the disk driver. File
service, however, intercepts file operations in the high level operations. This leads to
some important differences in the power and performance of the two approaches.
In disk service, a disk (or possibly a simulated disk) on a remote machine is
accessed by the operating system just like a local disk; physical disk block reads and
writes go directly over the network. Disk service uses a disk driver that goes to the
network instead of to a local file device. You could say that the network is being used
like a long SCSI cable. Disk service is very simple to implement; a friend once claimed
that he could write a complete disk server in under two hours.
Clearly, disk service is bound by disk formats, and so it does not work very well,
if at all, between different operating systems. A Mac and a PC want to see very
different things on their disks. It is possible, though difficult and expensive, to let
each understand the other's format; for instance, an external file system could be
written on the Mac to understand PC-format disks. However, there is a combinatorial
explosion associated with adding more formats. Each new system's format has to be
implemented on each already supported system, requiring lots of coding effort, and lots
of code space overhead on each machine.
Disk service does not easily permit file sharing between users. Disk service
uses an existing, unmodified or only very slightly modified, disk format, the format
that came with the operating system. Disk formats do not typically allow easy
synchronization of multiple users, because they are intended to be used only by a
single local machine. This means that only one person can mount a network disk at one
time, unless elaborate operating system interceptions and synchronization protocols
are developed. If such interceptions and protocols are done, then disk service is no
longer simpler than file service; and this simplicity was really its only benefit.
One approach to inter-operating-system disk service is to partition a server's
disk, and format each partition to the dictates of a different OS. To share files between
operating systems, a special utility is used to copy across partitions. This is the
approach used by 3-Com. This allows dynamic file sharing between machines using
the same operating system (sometimes), but between operating systems it is really
just file transfer. Using simulated disks has some of the same problems as
partitioning; for instance, you could allocate a one megabyte file on a VMS system and
use disk service software to make a Mac think this VMS file is really a
block-structured Mac disk, but people on VMS are not going to be able to get any useful
information out of the file without using special copying utilities.
In file service, high-level file system operations like open, read, write, lock,
and so forth go over the network instead of disk block requests. In many file service
protocols (e.g., TOPS, NFS, and CMU's VICE), a remote function call protocol is used
for support: this allows one machine to make function calls that are executed on
another machine. File service is usually very hard to implement well; TOPS took
about two years, and VICE took even longer to become a usable system.
However, once file service is done, there are some very tangible benefits over
disk service. The most tangible, to a naive end user, is that remote disks can be
shared; more than one person can have access to the same directories and volumes at
one time. No special synchronization protocol is needed.
Another very tangible benefit is an inter-operating-system capability. Most
operating systems have similar high-level file operations, like open, read, seek, and
so forth. There are differences, but they can almost always be bridged without losing
compatibility. TOPS is a standard for file system operations regardless of operating
system, and was simultaneously developed on two operating systems: that's how we
were able to get the PC and the Mac to communicate, and why our UNIX and forthcoming
OS implementations are proceeding smoothly. Some other file service protocols are
less OS independent; for instance, VICE is very specific to 4.2bsd UNIX, and a new
protocol, SNAP, had to be added to allow VICE machines to share files with
microcomputers.
Another benefit is that considerably more clever and powerful things can be done
in file service. VICE uses a whole-file local caching scheme that speeds up file access
tremedously for workstations that have their own disks. A file service protocol can be
extended easily to cope with the demands of new operating systems, without
encountering the combinatorial explosion of disk service. Disk servers, because of the
lack of file sharing, do not usually allow a machine to serve as both client and server,
or to function as one node in a homogeneous network namespace; these things can be
added to file service relatively easily.
File servers are often faster than disk servers. TOPS, a file server, is faster
than MacServe, a disk server, according to InfoWorld (11/86) and MacWorld
(10/86). This might seem puzzling, since disk service is simpler than file service.
We aren't entirely sure, but we think that it is the result of disk service's need to pass
directory and map blocks over the network to do directory, seek, and grow operations.
In file service, this is all done locally on the server machine, accomplishing in a single
network operation what takes two or more with disk service. Of course, local
operations are faster than network operations, just as eating everything at the table is
faster than walking to the kitchen to fetch each mouthful. Believe me, I have tried this
many times. So contrary to first impressions, a file server can often be expected to
perform better than a disk server.
Some brief design notes might be helpful. TOPS is a name used for both a network
protocol and the TOPS product which implements the protocol. The TOPS protocol is
built on a lower-level protocol called RFP, for Remote Function Protocol. Using RFP,
it is possible to make function calls that will be executed on another system. RFP
itself is built on top of the Appletalk Transaction Protocol (ATP), and will soon be
ported to run on the Internet Transmission Control Protocol (TCP) as well. RFP is an
asymmetrical protocol; it has a client end, which makes remote calls and receives
their values, and a server end, which receives remote calls, executes them locally, and
returns the result to the client that initiated the call.
The TOPS protocol is a set of function definitions that are passed over the network
using RFP. These include functions to open files, read and write buffers, lock files and
byte ranges, get information on files and directories, and so forth. When some
software on the Mac makes a file system call that has to do with a remote file, this
system call is intercepted and translated into a TOPS call. RFP's client end is then used
to make this TOPS call remotely on the system where the file is actually stored. The
RFP server end on the machine containing the file executes the TOPS call locally,
which means calling the local file system, and returns the result to the client. Because
everything goes through TOPS, the two machines may have completely different
operating systems. All that is needed is for the TOPS client software to translate local
file system operations into TOPS operations, and for the TOPS server to translate TOPS
operations into local file system operations.
The Apple Filing Protocol (AFP) and Sun's Network File System (NFS) use
somewhat similar designs. With AFP, a protocol called ASP (Apple Session Protocol)
is used for remote function calling. Actually, ASP does less than RFP, since it does not
itself interpret the data in the packets, deferring this to a sort of implied remote
function call layer in AFP itself. (Pay attention; this will be on the test.) ASP sits on
top of ATP; like AFP, it was co-developed by Centram and Apple. NFS uses a protocol
called RPC (Remote Procedure Call), which uses a sub-protocol known as XDR
(External Data Reference) to define its data formats. RPC sits on top of the Internet
User Datagram Protocol (UDP).
There were some comments on TOPS from "MacoWaco" in the January 1987
issue of MacTutor which were not quite accurate regarding use of Apple's "File
Structure". I don't know what he means by "Apple's File structure". The most likely
interpretation seems to be the Apple Filing Protocol, AFP for short. This is an Apple
protocol for network file service. The design of AFP is similar to the design of the
TOPS protocol. AFP is still being refined within Apple; when it is finalized later this
year, TOPS will become fully compatible with it. MacServe, however, cannot be
compatible with AFP. Its disk service approach is fundamentally incompatible with
the file service approach employed by both TOPS and AFP. Another possible
interpretation of "Apple's File structure" pertains to disk format. TOPS uses disks as
they are, with no modifications needed, while MacServe requires reformatting and
partitioning disks before they can be used.
Appletalk runs at about one quarter megabit per second, because this is the
fastest speed the SCC will handle without special clocking. Ethernet runs at three or
ten megabits per second, twelve or forty times as fast. It should be noted, though, that
most network protocol implementations cannot drive a network at a full bandwidth of
multiple megabits per second; an Ethernet tends to be idle a lot of the time, but it's
still effectively many times faster than Appletalk, but then again, many times more
expensive to implement. The new Macs will allow customers to match their
pocketbooks with their bandwidth requirements.