Help / Learning / Ch 28: Data Files
>> << Pri JfC LJ Phr Dic Voc !: Rel NuVoc wd Help Learning J
|
Chapter 28: Data FilesThe subject of file-handling in general, and how data is organized in files, is a major topic in itself. In this chapter we will cover only a selection of the facilities available in J. J functions to read files produce results in the form of character-strings, and similarly functions to write files take strings as arguments. Such a string can be the whole data content of a file when the available memory of the computer is sufficient. Our approach here will be to look first at some J functions for input and output of strings. Then we look at a few examples of dealing with strings as representing data in various formats. Finally we look at mapped files as an alternative to conventional file-handling. 28.1 Reading and Writing Files28.1.1 Built-in VerbsIn the following, a filename is a string which is valid as a filename for the operating-system of the computer where we are running J. For example: F =: 'c:\temp\demofile.xyz' NB. a filename The built-in verb 1!:2 writes data to a file. The right argument is a boxed filename. The left argument is a character-string, the data to be written. The effect is that the file is created if it does not already exist, and the data becomes the whole content of the file. The result is null. 'some data' 1!:2 < F NB. write to file F The built-in verb 1!:1 reads data from a file. The right argument is a boxed filename. The result is a character-string, the data read. data =: 1!:1 < F NB. read from file F
28.1.2 Screen and Keyboard As FilesScreen and keyboard can be treated as files, to provide a simple facility for user-interaction with a running program. The expression x (1!:2) 2 writes the value of x to "file 2", that is, to the screen. A verb to display to the screen can be written as display =: (1!:2) & 2 For example, here is a verb to display the stages in the computation of least-common-denominator by Euclid's algorithm. E =: 4 : 0 display x , y if. y = 0 do. x else. (x | y) E x end. ) 12 E 15 12 15 3 12 0 3 3 0 3 The value to be displayed by (1!:2) &2 is not limited to strings: in the example above a list of numbers was displayed. User-input can be requested from the keyboard by reading "file 1", that is, by evaluating (1!:1) 1. The result is a character-string containing the user's keystrokes. For example, a function for user-interaction might be: ui =: 3 : 0 display 'please type your name:' n =. (1!:1) 1 display 'thank you ', n '' ) and then after executing ui '' a dialogue appears on the screen, like this: please type your name: Waldo thank you Waldo
28.1.3 Library VerbsThere are a number of useful verbs for file-handling in the "standard library" (Chapter 26). Here is a brief summary of a selection:
From now on we will use these library verbs for our file-handling. The library verb fwrite writes data to a file. The right argument is a filename. The left argument is a character-string, the data to be written. The effect is that the file is created if it does not already exist, and the data becomes the whole content of the file. 'some data' fwrite F NB. file write 9 The result shows the number of characters written. A result of _1 shows an error: either the left argument is not a string or the right argument is not valid as a filename, or the specified file exists but is read-only. (3;4) fwrite F _1 The library verb fread reads data from file. The argument is a filename and the result is a character-string.
A result of _1 shows an error: the specified file does not exist, or is locked.
28.2 Large FilesFor large files, the memory of the computer may not be sufficient to allow the file to be treated as a single string. We look at this case very briefly. Write a file with some initial content: 'abcdefgh' fwrite F 8 We can append some data to the file with library verb fappend. 'MORE' fappend F 4 To see the effect of fappend (just for this demonstration, but not of course for a large file) we can read the whole file again : fread F abcdefghMORE We can read a selected slice of the file, say 8 bytes starting from byte 4. In this case we use fread with a right argument of the form filename;start,size. start =: 4 size =: 8 fread F ; start, size efghMORE
28.3 Data FormatsWe look now at a few examples of how data may be organized in a file, that is, represented by a string. Hence we look at converting between character strings, with various internal structures, and J variables. We take it that files are read and written for the purpose of exchanging data between programs. Two such programs we can call "writer" and "reader". Questions which arise include:
28.3.1 The Binary Representation for J-0nly FilesSuppose we aim to handle certain files only in J programs, so that we are free to choose any file format convenient for the J programmer. The "binary representation" is particularly convenient. For any array A, A =: 'Thurs'; 19 4 2001 the binary representation of A is a character string. There are built-in verbs to convert between arrays and binary representations of arrays. arrbin =: 3!:1 NB. array to binary rep. binarr =: 3!:2 NB. binary rep. to array If B is the binary representation of A, we see that B is a character string, with a certain length.
We can write B to a file, read it back, and do the inverse conversion to recover the value of A :
From J4.06 on, there are variations of the binary representation verbs above to allow for different machine architectures: see the Dictionary under 3!:1. 28.3.2 Text FilesThe expression a. (lower-case a dot) is a built-in noun, a character-string containing all 256 ASCII characters in sequence.
In the ASCII character set, that is, in a., the character at position 0 is the null, at position 10 is line-feed and at position 13 is carriage return . In J, the names CR and LF are predefined in the standard profile to mean the carriage-return and linefeed characters. a. i. CR,LF 13 10 We saw fread and fwrite used for reading and writing character files. Text files are a special kind of character file, in that lines are delimited by CR and/or LF characters. On some systems the convention is that lines of text are delimited by a single LF and on other systems a CR,LF pair is expected. Regardless of the system on which J is running, for J text variables, the convention is always followed of delimiting a line with single LF and no CR. Here is an example of a text variable. t =: 0 : 0 There is physics and there is stamp-collecting. ) Evidently it is a string (that is, a 1-dimensional character list) with 3 LF characters and no CR characters.
If we aim to write this text variable t to a text file, we must choose between the single-LF or CRLF conventions. There are two useful library verbs, fwrites and freads to deal with this situation.
For convenience in dealing with a text variable such as t, we can cut it into lines. A verb for this purpose is cut (described more fully in Chapter 17 ). cut =: < ;. _2 cut produces a boxed list of lines, removing the LF at the end of each line. lines =: cut t lines +----------------+-------------+-----------------+ |There is physics|and there is |stamp-collecting.| +----------------+-------------+-----------------+ The inverse of cut we can call uncut. It restores the LF at the end of each box and then razes to make a string. uncut =: ; @: (,&LF &. >) uncut lines There is physics and there is stamp-collecting.
28.3.3 Fixed Length Records with Binary DataSuppose our data is in two J variables: a table cnames, of customer-names, and a list amts in customer order with for each customer an amount, a balance say.
Now suppose the aim is to write this data to a file, formatted in 16-byte records. Each record is to have two fields: customer-name in 12 bytes followed by amount in 4 bytes, as a signed integer. Here is a possible approach. The plan is to construct, from cnames and amts, an n-by-16 character table, to be called records. For this example, n=2, and records will look like this: Mr Rochester#### Jane #### where #### represents the 4 characters of an integer in binary form. We build the records table by stitching together side by side an n-by-12 table for the customer names field, and an n-by-4 table for the amounts field. For the customer-names field we already have cnames which is suitable, since it is 12 bytes wide: $ cnames 2 12 For the amounts field we convert amts to characters, using ci4 from Chapter 27. The result is a single string, which is reshaped to be n-by-4. ci4 =: 2 & (3!:4) NB. integer to 4 char amtsfield =: ((# amts) , 4) $ ci4 amts Now we build the n-by-16 records table by stitching together side-by-side the two "field" tables: records =: cnames ,. amtsfield To inspect records, here is a utility verb which shows a non-printing character as # inspect =: 3 : ('A=.a.{~32+i.96';'(A i.y) { A,''#''')
The outgoing string to be written to the file is the ravel of the records. (, records) fwrite F 32 The inverse of the process is to recover J variables from the file. We read the file to get the incoming string. instr =: fread F Since the record-length is known to be 16, the number of records is NR =: (# instr) % 16 Reshape the incoming string to get the records table. inspect records =: (NR,16) $ instr Mr Rochester#### Jane #### and extract the data. The customer-names are obtained directly, as columns 0-11 of records. cnames =: (i.12) {"1 records For the amounts, we extract columns 12-15, ravel into a single string and convert to integers with c4i. c4i =: _2 & (3!:4) NB. 4 char to integer amts =: c4i , (12+i.4) {"1 records
28.4 Mapped FilesA file is said to be mapped when the file is temporarily incorporated into the virtual-address-translation mechanism of an executing program. The data in a mapped file appears to the J programmer directly as the value of a J variable - an array. Changes to the value of the variable are changes to the data in the file. In such a case, we can say, for present purposes, that the file is mapped to the variable or, equivalently, that the variable is mapped to the file. Mapped files offer the following advantages:
There are two cases. In the first case, any kind of existing file can be mapped to a variable. We take as given the structure of the data in the file, and then the J program must supply a description of the desired mapping. For example, a file with fixed-length records could be mapped to a character table. In the second case, a file can be created in J in a special format (called "jmf") specifically for the purpose of mapping to a variable. In this case, the description is automatically derived from the variable and stored in the file along with the data. Thus a "jmf" file is self-describing. We look first at creating jmf files, and then at mapping given files.. 28.4.1 Library Script for Mapped FilesThere is a library script, jmf.ijs, for handling mapped files. For present purposes it is simplest to download it directly from the J Application Library. Here is a link to jmf.ijs. Assuming we have downloaded it into say, directory C:\temp for example, we can load it into our J session with: load 'c:\temp\jmf.ijs' The script will load itself into the locale jmf . 28.4.2 jmf Files and Persistent VariablesSuppose we have constructed an array V with some valuable data, which from now on we aim to use and maintain over a number of J sessions. Perhaps V is valuable now, or perhaps it will become valuable over subsequent sessions as it is modified and added-to. Our valuable data V can be an array of numbers, of characters, or of boxes. For a simple example we start with V as a table of numbers. ] V =: 2 2 $ 1 2 3 4 1 2 3 4 We can make a persistent variable from V as follows. Step 1 is to estimate the size, in bytes, of a file required for the value of V. Since we expect that over time V may grow from its present size ultimately to, say, 64 KB, then our estimate S is S =: 64000 If in doubt, allow plenty. The size must be given as a positive integer (not a float) and therefore less than 2147483648 (2Gb) on a 32-bit machine. Step 2 is to choose a file-name and, for convenience, define a variable F to hold the the file name as a string. For example: F =: 'c:\temp\persis.jmf' Step 3 is to create file F as a jmf file, large enough to hold S bytes of data. For this purpose the utility function createjmf is available (in locale jmf) so we can write: createjmf_jmf_ F;S (On your system, with a different version of J, you may see a response different from what is shown here.) At this point, file F exists. If we inspect it we see its actual size is a little larger than S, to accommodate a header record which makes the file self-describing. fdir F +----------+------------------+-----+---+------+ |persis.jmf|2012 12 16 8 37 22|64284|rw-|-----a| +----------+------------------+-----+---+------+ The content of file F is initially set by createjmf_jmf_ to represent a J value, in fact a zero-length list. The important point is that file F now contains a definite value. Step 4 is to map the content of file F to a new variable, for which we choose the name P. map_jmf_ 'P'; F This statement means, in effect: P =: value-currently-in-file-F and we can verify that P is now an empty list:
Notice particularly that the effect of mapping file F to variable P is to assign the value in F to P and not the other way around. Hence we avoided mapping file F directly onto our valuable array V because V would be overwritten by the preset initial value in F, and lost. Step 5 is to assign to P the desired value, that of V P =: V Variable P is now a persistent variable, since it is mapped to file F. We can amend P, for example by changing the value at row 0 column 1 to 99.
or by appending a new row: ] P =: P , 0 0 1 99 3 4 0 0 Step 6 is needed before we finish the current session. We unmap variable P, to ensure file F is closed. unmap_jmf_ 'P' 0 The result of 0 indicates success. The variable P no longer exists:
To demonstrate that the value of P persists in file F we repeat the mapping, processing and unmapping in this or another session. The name P we chose for our persistent variable is only for this session. In another session, the persistent variable in file F can be mapped to any name. This time we choose the name Q for the persistent variable. We map file F to Q: map_jmf_ 'Q' ; F Q 1 99 3 4 0 0 modify Q: ] Q =: Q , 7 8 1 99 3 4 0 0 7 8 and unmap Q to close file F. unmap_jmf_ 'Q' 0 28.4.3 Mapped Files are of Fixed SizeRecall that we created file F large enough for S bytes of data. S 64000 fdir F +----------+------------------+-----+---+------+ |persis.jmf|2012 12 16 8 37 22|64284|rw-|-----a| +----------+------------------+-----+---+------+ The variable in file F is currently much smaller than this, and the unused trailing part of the file is filled with junk. However, if we continue to modify Q by appending to it, we reach a limit, by filling the file, and encounter an error. To demonstrate, with a verb fill for the purpose: fill =: 3 : 0 try. while. 1 do. Q =: Q , 99 99 end. catch. 'full' end. ) map_jmf_ 'Q'; F fill '' full The amount of data now in Q can be estimated as 4 bytes per integer (since Q is integer) multiplied by the number of integers, that is, altogether 4 * */$ Q. This result for the final size of Q accords with our original size estimate S.
unmap_jmf_ 'Q' 0
28.4.4 Given FilesNow we look at mapping ordinary data files (that is, files other than the special jmf-format files we considered above). The way the data is laid out in the file we take as given, and our task is specify how this layout is to be represented by the type, rank and shape of a J variable, that is, to specify a suitable mapping. For example, suppose we aim to read a given file G with its data laid out in fixed-length records, each record being 8 characters. Suppose file G was originally created by, say: G =: 'c:\temp\data.xyz' 'ABCD0001EFGH0002IJKL0003MNOP0004' fwrite G 32 The next step is to decide what kind of a variable will be suitable for mapping the data in file G. We decide on an n-by-8 character table. The number of rows, n, will be determined by the amount of data in the file, so we do not specify n in advance. It is convenient to start with a small example of an n-by-8 character table, which we call a prototype. The choice of n is unimportant. prototype =: 1 8 $ 'a' Now the mapping can be defined by: ] mapping =: ((3!:0) ; (}. @: $)) prototype +-+-+ |2|8| +-+-+ We see that mapping is a boxed list. The first item is the data-type. Here 2, meaning "character", is produced by 3!:0 prototype. The second item is the trailing dimensions (that is, all but the first) of the prototype. Here 8 is all but the first of 1 8, produced by (}.@:$) prototype. Thus mapping expresses or encodes "n-by-8 characters". Now mapping is supplied as left argument to (dyadic) map_jmf_. We map file G onto a variable for which we choose the name W thus: mapping map_jmf_ 'W'; G We see that W is now a variable. Its value is the data in the file.
We can amend the data in the ususal way: ] W =: 'IJKL9999' 2 } W ABCD0001 EFGH0002 IJKL9999 MNOP0004 What we cannot do is add another row to the data, because all the space in file G is occupied by the data we already have.
We close file G by unmapping variable W: unmap_jmf_ 'W' 0
28.4.5 Mapped Variables Are SpecialMapping files to variables offers the programmer significant advantages in functionality and convenience. The price to be paid for these advantages is that there are some considerations applying to mapped variables which do not apply to ordinary variables. The programmer needs to be aware of, and to manage, these considerations. This is our topic in this section and the next. If A is an ordinary variable, not mapped, then in the assignment B=: A the value of A is in effect copied to B. A subsequent change to A does not affect the value of B.
By contrast, consider a variable mapped to a file. If the file is very large, there may not be enough space for another copy of the value. Hence copying is to be avoided. Compare the previous example with the case when A is a mapped variable. map_jmf_ 'A';F
We see that B changes with changes to A. In effect B =: A means that B is another name for A, not a copy of the value of A. That is, both A and B refer to the same thing - the value in the file. Hence it is also the case that A changes with changes to B.
Consider now an explicit verb applied to a mapped variable. Here y becomes another name for the data in the file. Hence assignment to y (even a local assignment) may cause an unintended change the mapped variable in the file. For example foo =: 3 : ' 3 * y =. y + 1'
28.4.6 Unmapping RevisitedThe current status of mapped files and variables is maintained by the J system in a "mapping table". The mapping table can be displayed by entering the expression showmap_jmf_ but for present purposes here is a utility function to display only selected columns. status =: 0 1 9 & {"1 @: showmap_jmf_ status '' +-------+------------------+----+ |name |fn |refs| +-------+------------------+----+ |A_base_|c:\temp\persis.jmf|3 | +-------+------------------+----+ We see that currently variable A in locale base is mapped to file F (persis.jmf). Under "refs", the value 3 means that the data in file F is the target of 3 references. One of these is variable A, a second is the variable B (which we know to be another name for A) and the third is for the system itself. Variables A and B are both in existence:
For the sake of simplicity, a recommended procedure for closing the file is first to erase all variables such as B which are alternative names for the originally-mapped variable A erase <'B' 1 The status shows the number of references is reduced. status '' +-------+------------------+----+ |name |fn |refs| +-------+------------------+----+ |A_base_|c:\temp\persis.jmf|2 | +-------+------------------+----+ Now we can unmap A. unmap_jmf_ 'A' 0 The result of 0 means the file is closed and A erased. The status table shows no entries, that is, that no files are mapped. status '' +----+--+----+ |name|fn|refs| +----+--+----+ Let us recreate the situation in which A is mapped to F and B is another name for A, so there are 3 references to (the data in) file F. map_jmf_ 'A'; F B =: A status '' +-------+------------------+----+ |name |fn |refs| +-------+------------------+----+ |A_base_|c:\temp\persis.jmf|3 | +-------+------------------+----+ What happens if we erase all the variables referring to F ? erase 'A';'B' 1 1 status '' +-------+------------------+----+ |name |fn |refs| +-------+------------------+----+ |A_base_|c:\temp\persis.jmf|1 | +-------+------------------+----+ We see there is still a single reference, under the name A even though there is no variable A. This single reference reflects the fact that file F is not yet unmapped. Thus when we said earlier that file F gets mapped to variable A, it would be more accurate to say that file F gets mapped to the name A, and a variable of that name is created. Even though the variable is subsequently erased, the name A still identifies the mapped file, and can be used as an argument to unmap. unmap_jmf_ 'A' 0 status '' +----+--+----+ |name|fn|refs| +----+--+----+ For more information, see the "Mapped Files" lab. This is the end of Chapter 28 |
The examples in this chapter
were executed using J version 701.
This chapter last updated 16 Dec 2012
Copyright © Roger Stokes 2012.
This material may be freely reproduced,
provided that this copyright notice is also reproduced.
>> << Pri JfC LJ Phr Dic Voc !: Rel NuVoc wd Help Learning J