Random Access Files
Volume Number: 2
Issue Number: 9
Column Tag: Basic School
By Dave Kelly, MacTutor Editorial Board
There are two types of data files that can be created and used by your MS Basic
program: sequential files and random access files. Sequential files are used more often
because they are easy to create, but random access files are more flexible and data can
be located faster. A discussion of sequential file I/O operation begins on page 45 of your
MS Basic manual (ver. 2.0 or greater). Random Access File I/O starts on page 48.
Before we begin our discussion of random access file I/O, I suggest that you refer to
those pages.
The purpose of this column is to help you develop an understanding of random
access I/O and how to use it in your own programs. It is very easy to understand how
data is structured in sequential files. It requires more work to organize a random
access file. The organization of the random access file is up to you. I'll try to outline
some steps you can use to help organize your file.
First, you should decide just what data you have to store. For example, if you
were setting up a mail list database you would need one field each for name, address,
city-state, and zipcode. Next decide how many characters will be allowed for each field
(25 for name, 30 for address, 25 for city-state, and 5 for zipcode). The total length of
an individual record would then be 85 characters.
Now decide how many individual records you expect to have in the file. If you
don't require too many records and don't expect to ever expand the file, a sequential file
many be suitable. The is e specially true if you have a lot of RAM to work with and a
comparatively small data file. There are some advantages and disadvantages to using a
sequential file this way. With a sequential file, all records are read into memory so the
disk is only accessed once. The program can then operate on the data much faster than if
it had to access the disk for each record. However, if the data had been changed at all,
the entire file would have to be stored back to the disk or the changes would be lost. In
the event of a power failure or some other system crash, a random access file would
contain all the changes, but a sequential file would not. Generally as files get larger,
they are better handled by random access methods. A large sequential file could take
quite a bit of time just to read and write to the disk.
Next you should consider how you want to access each record of your random
access file. You may want to be able to search for a name or sort the file by zip code. A
long and tedious way to do this would be to read through each and every record until the
desired record is found. If the user knows exactly which record to read then the access
time may be reduced significantly. One way to do this would be to create an index file.
For example, if you wanted to find a specific record and you know the contents of one of
the fields, you could look in the index file to find the matching field and record number.
For a mail list database you might set up an index file containing all of the names and
the record numbers corresponding to the names. Index files may be sequential or
random access (for relation databases) but should contain as few fields as possible to
optimize data access time. If the index is sequential it should be kept in memory and
updated as the random file is updated.
Figure 1
If indexes are used, some thought must be taken as to updating and changing the
index file. If a record is to be deleted, you might want to delete the index, thus
removing any reference to the random access file record. This leaves an available
record for late addition of a new record (if you keep track of which records have been
deleted). If your file isn't expected to change very much you may not mind the wasted
space taken up by the deleted record. Ideally, you should keep track of the locations of
deleted records so that they can be reused when new records are added. Another way to
get rid of the wasted records (if you don't want to go to the trouble of keeping track of
the deleted records) is to write a program to do "Garbage Collection".
Fig. 2 Garbage Collection
A "Garbage Collection" program reads all undeleted records and writes them to a
new file. You only have to do "Garbage Collection" when a lot of records have been
deleted and you need more space to add new records. "Garbage Collection" might be ok to
use if it is automatically performed (with no user intervention). It is NOT desirable
for the user of your program to have to keep track of this kind of file handling (when to
collect garbage and when not to).
When a record is added to the datafile, a new index entry should be created and
the new record should be added to the random access file (either as a new record or
replacing a previously deleted record). If an existing record is edited and changed the
index file should be updated accordingly. You may want to sort the index file before
writing it to the disk. Be sure to save the index program before quiting the program.
Now let's take a look at how the random access file is structured. When you open
a file in basic, a buffer is allocated for each file opened. For random access files the
buffer should be set equal to the length of one record ( the default buffer size is 128
bytes). It is through this buffer that basic reads and writes to the disk. To help you
understand what a random access file "looks like", let's create a sample file to examine.
The Random Access File program included with this column will create a sample random
access file that we can analyze. It creates a random file named "Sample RA File" with a
length of 64 bytes. One advantage of MS Basic random access files is that random access
files require less room on the disk, since Basic stores them in a packed binary format.
Sequential files are stored as a series of sequential ASCII characters.
To facilitate the conversion of numbers to the packed binary format we must use
the MKI$,MKS$,MKD$ commands. To unconvert the numbers we must use
CVI,CVS,CVD commands. These are somewhat easy to remember if you think of the MK
as MaKe and the CV as ConVert. Thus if we want to store an integer number we use
MKI$ to MaKe an Integer string and use CVI to ConVert the Integer back again. The
sample file shows an example of how to use these MaKe and ConVert commands for
integers, single precision and double precision numbers.
As I already mentioned, when the file is opened a buffer is allocated (in this case
the length of all the fields is 64). The fields that we want to use must be memory
mapped to the buffer area. This is accomplished with the FIELD statement. You may use
as many FIELD statements as you like, however, each field statement starts defining the
fields starting at the beginning of the buffer. If you define all your fields on one line
(one FIELD statement) then you won't have any problem, but if you have more fields
than you want to put in one statement then you will want to use a second FIELD
statement. The trick (which the manual does not show you how to do) is to define a
dummy variable with the accumulative length of all the previous field statements
before defining your next field. In the sample program the first FIELD statement
defines three number fields with a total of 14 bytes. (Integer fields are converted to 2
bytes, single precision to 4 bytes, and double precision to 8 bytes). In the second
FIELD statement a dummy string is marking the first part of the buffer which has
already been defined so that the next field will begin after the previously defined fields.
If you didn't know to do this you could have some strange effects when you read your file
back as the field definitions would overlap.
The next important thing that the program must do is to put our data into the
buffer so it can be written to a record on the disk. This is accomplished with the LSET
or RSET statements. LSET will left justify the string within the defined field length (a
variable might be actually shorter than the field has available), The RSET statement
will right justify the string within the field. Every field must be set into the buffer
with one of these commands. You should use a different variable in defining the fields
and setting into the buffer than you use to manipulate your data. Be sure that you don't
use a defined field in an INPUT or LET type statement. This will redefine the location
that the variable points to (we want it to put to the buffer area). If a record is read
from the disk, all the fields defined in the buffer area will contain the data stored on
the disk for that record. You only have to reset those fields that you want to change. All
the rest of the fields will be left untouched until you read another record into the
buffer or set a new value into the field.
To store a record to disk use PUT [# ]filenumber [, record-number ]. To read a
record from the disk use GET [# ]filenumber [, record-number ]. The PUT, GET
statements read and write the entire record in the buffer. You use PUT after you use
the MaKe string statements and use ConVert statements after using GET. You can find
more information on PUT and GET in your Basic manual (pages 220 and 146). Run the
sample program to create a random access file we can examine.
The second program included with this column is a random access utility that I
developed to analyze the data stored in a random access file. I have been saved from alot
of problems with programs like this in the past. I have been able to repair damaged
random access files and determine what buggy random programs were doing with
utilities like this one.
The utility program opens with a menu which will allow you to open your file.
Choosing open from the File menu brings up the standard getfile dialog box from which
you can choose the file you want to examine. (You should choose "Sample RA File" for
this example). Next, the program asks for the length of the random access file record.
If you wrote the program you should have this available, however, if you don't know
what it is you can guess. The sample file is 64 bytes so enter a 64 for the length (then
click OK).
The file menu now has made active a menu item named Edit in the File menu (this
may be confusing - it is NOT the Edit menu). Selecting Edit from the File menu will
bring up a prompt for the record number you want to read. Enter a '1' to read record
number one (the sample file only has one record) (click OK). Next the record is read
into the buffer and displayed on the screen. The first EDIT FIELD shown displays the
file as it looks. Note that some of the ASCII characters are invisible and can't be seen
in the EDIT FIELD. The second EDIT FIELD shows the equivalent ASCII representation of
the record. Invisible characters can be seen (for example a '0' is a null character).
Either of these two fields can be modified or examined as you like.
The hardest thing to analyze is the numbers which have been converted to strings
with the MaKe statements. To make this somewhat easier (though not foolproof) the
program provides a way to convert your numbers from strings to numbers and
numbers to strings to see how these ConVert/MaKe statements work. The third EDIT
FIELD provides the way to enter the number or string to be converted. For example,
enter a 5 in the field and select MKI$(integer) Convert from the Convert menu. The
integer 5 will be converted to the packed binary format string. Note that the first field
stored by our sample file is '0, 5' which was the two byte string made from the integer
5 (see the sample program if you don't follow this). The converted string has been
placed in the third EDIT FIELD. The characters there are invisible (0 and 5 ASCII do
not print). If you select CVI(string) Convert (2-bytes) from the Convert menu, the
string will be converted back to the integer equivalent and displayed.
The rest is up to you as to what you want to do with the utility. It is possible to
modify data in the random record by typing the change in one of the first two EDIT
FIELDs. Then select the button at the top of the window to write the record. When you
select 'OK', the EDIT FIELD which which is active (the EDIT FIELD which the cursor is
blinking) will be stored in place of the record. It is possible to convert a number in the
Convert EDIT FIELD then COPY the contents of the EDIT FIELD and PASTE it into the text
in the first EDIT FIELD. It may be somewhat difficult to COPY/PASTE invisible
characters (because you can't see them to select them) although it is possible. I
recommend that you display the converted ASCII equivalent and enter the ASCII
characters into the second EDIT FIELD and save the record to the disk.
That's all there is on random access files. Hopefully the utility will help you to
learn some things by experimentation about random access. Any questions may be
directed to myself via MacTutor.
' Random Access File
' ©MacTutor 1986
' This program creates a sample Random Access File
Integer%=5: Single!=32769!: Double#=123456789#
Title$="MacTutor, The Macintosh Programming Journal
OPEN "Sample RA File" AS #1 LEN=64
FIELD #1,2 AS I$,4 AS S$,8 AS D$
FIELD #1,14 AS Dummy$,50 AS T$
TEXTFACE(1)
PRINT "Our Variables are: Integer%=";Integer%;"Single!=";Single!
PRINT "Double#=";Double#
PRINT "Title$=";Title$
TEXTFACE(0)
WRIT: PRINT"We will now save them to record 1 ( record
length=64).
LSET I$=MKI$(Integer%)
LSET S$=MKS$(Single!)
LSET D$=MKD$(Double#)
LSET T$=Title$
PUT #1,1
CLOSE #1
PRINT"Now clear all variables... and print them:
Integer%=0:Single!=0:Double#=0:Title$=
TEXTFACE(1)
PRINT "Our Variables are: Integer%=";Integer%;"Single!=";Single!
PRINT "Double#=";Double#
PRINT "Title$=";Title$
TEXTFACE(0)
PRINT "Now read them back again...
OPEN "Sample RA File" AS #1 LEN=64
FIELD #1,2 AS I$,4 AS S$,8 AS D$ , 50 AS T$
GET #1,1
LET Integer%=CVI(I$)
LET Single!=CVS(S$)
LET Double#=CVD(D$)
LET Title$=T$
PRINT"Now close the file and print them all...
CLOSE #1
TEXTFACE(1)
PRINT "Our Variables are: Integer%=";Integer%;"Single!=";Single!
PRINT"Double#=";Double#
PRINT "Title$=";Title$
TEXTFACE(0)
END
' Professor Mac's Random Access Utility
' ©MacTutor 1986
' By Dave Kelly
OPTION BASE 1
DEFINT a-z
WINDOW 1,"",(2,25)-(510,335),3
GOSUB WindowHeader
Recordnumber=1
MENU 1,0,1,"File
MENU 1,1,1,"Open
MENU 1,2,0,"Close
MENU 1,3,0,"Edit
MENU 1,4,1,"Quit
MENU 3,0,0,
MENU 4,0,0,
MENU 5,0,0,
False=0: True= NOT False
Fileopen = False
ON MENU GOSUB MenuEvent
MENU ON
WaitForEvent: GOTO WaitForEvent
MenuEvent:
MenuNumber = MENU(0)
MenuItem = MENU(1):MENU
ON MenuNumber GOSUB Filemenu,Editmenu,Convertmenu
RETURN
Filemenu:
ON MenuItem GOSUB OpenFile,CloseFile,FindRecord,Quititem
RETURN