Help / Learning / Ch 27: Representations and Conversions
>> << Pri JfC LJ Phr Dic Voc !: Rel NuVoc wd Help Learning J
|
Chapter 27: Representations and ConversionsIn this chapter we look at various transformations of functions and data. 27.1 Classes and TypesIf we are transforming things into other things, it is useful to begin with functions which tell us what sort of thing we are dealing with. 27.1.1 ClassesGiven an assignment, name =: something, then something is an expression denoting a noun or a verb or an adverb or a conjunction. That is, there are 4 classes to which something may belong. There is a built-in verb 4!:0 which here we can call class. class =: 4!:0 We can discover the class of something by applying class to the argument <'name'. For example,
The result of 0 for the class of n means that n is a noun. The cases are: 0 noun 1 adverb 2 conjunction 3 verb and two more cases: the string 'n' is not a valid name, or n is valid as a name but no value is assigned to n. _2 invalid _1 unassigned For example:
The argument of class identifies the object of interest by quoting its name to make a string, such as 'C'. Why is the argument not simply the object? Because, by the very purpose of the class function, the object may be a verb, noun, adverb or conjunction, and an adverb or conjunction cannot be supplied as argument to any other function. Why not? Suppose the object of interest is the conjunction C. No matter how class is defined, whether verb or adverb, any expression of the form (class C) or (C class) is a bident or a syntax error. In no case is function class applied to argument C. Hence the need to identify C by quoting its name. 27.1.2 TypesA noun may be an array of integers, or of floating-point numbers or of characters, and so on. The type of any array may be discovered by applying the built-in verb 3!:0. For example
The result of 8 means floating-point and the result 2 means character. Some of the possible cases for the result are:
There is also a useful verb datatype in the standard library which produces not a number but a name for the type of its argument.
27.2 ExecuteThere is a built-in verb ". (doublequote dot, called "Execute"). Its argument is a character-string representing a valid J expression, and the result is the value of that expression. ". '1+2' 3 The string can represent an assignment, and the assignment is executed:
If the string represents a verb or adverb or conjunction, the result is null, because Execute is itself a verb and therefore its results must be nouns. However we can successfully Execute assignments to get functions.
27.3 RepresentationsWhen an expression is entered at the keyboard, a value is computed and displayed on-screen. More precisely, it is a representation of that value which is displayed, as a sequence of characters. For example, if we define a function foo: foo =: +/ % # and then view the definition of foo: foo +-----+-+-+ |+-+-+|%|#| ||+|/|| | | |+-+-+| | | +-----+-+-+ we see on the screen some representation of foo. What we see depends on which option is chosen from several possibilities. The default option for the representation of a function is as a boxed structure, as in the example of foo above. There are other possibilities, which we look at next. 27.3.1 Representations of Nouns for DisplayFor a noun of type character, its value can be represented to the user simply by the J interpreter writing its characters to the screen (with line-breaks at appropriate places) . By contrast, for nouns of other datatypes - numbers, or boxes, or symbols, a displayable representation require converting the data value to characters. For this purpose there is a built-in function ": (double-quote colon, called Format). We have already met Format in Chapter 19 where it was used for formatting numbers. The Format verb produces nouns which are character representations looking identical to the argument: ] n1 =: 'toujours' ; 'l''audace' +--------+--------+ |toujours|l'audace| +--------+--------+ ] r1 =: ": n1 NB. a representation of n1 +--------+--------+ |toujours|l'audace| +--------+--------+ but n1 and its representation r1 are of different datatypes, and different dimensions.
27.3.2 Drawing Boxes with the Format VerbThe Format ": verb does three things:
We can specify the characters used for drawing boxes. In this book I have drawn boxes with '+ | -' characters because only these can I rely on to be correctly rendered in all web-browsers. However, in a J session, you will probably see boxes drawn with other characters, giving a more pleasing appearance. The box-drawing characters used by the Format verb are specified by a global parameter. The value of this parameter can be inspected with 9!:6 : 9!:6 '' +++++++++|- and we see the eleven characters currently in effect. The parameter can be set with 9!:7 . Here is a verb which Formats the right argument with box-drawing characters given by the left argument. Notice that it saves and restores the global parameter. fmt =: 4 : 0 assert. 11 = # x t =. 9!:6 '' NB. save current box-draw chars 9 !:7 x NB. set box-draw chars to new value z =. ": y 9 !: 7 t NB. restore original box-draw chars z ) To show which of the eleven characters goes where, we can draw boxes with the characters 0123456789A ] w =: 2 2 $ < ' ' +----+----+ | | | +----+----+ | | | +----+----+ '0123456789A' fmt w 0AAAA1AAAA2 9 9 9 3AAAA4AAAA5 9 9 9 6AAAA7AAAA8 Dyadic Format allows control over where the data is placed in the box. For details see the Dictionary 27.3.3 Representations of FunctionsThere are several options for producing representations of functions, that is, representations of verbs, adverbs or conjunctions. By default the current option is the "boxed representation", so we see the verb foo (defined above) depicted graphically as a structure of boxes. foo +-----+-+-+ |+-+-+|%|#| ||+|/|| | | |+-+-+| | | +-----+-+-+ Other options are available, described below. To select and make current an option for representing functions on-screen, enter one of the following expressions: (9!:3) 2 NB. boxed (default) (9!:3) 5 NB. linear (9!:3) 6 NB. parenthesized (9!:3) 4 NB. tree (9!:3) 1 NB. atomic The current option remains in effect until we choose a different option. 27.3.4 Linear RepresentationIf we choose the the linear representation, and look at foo again: (9!:3) 5 NB. linear foo +/ % # we see foo in a form in which it could be typed in at the keyboard, that is, as an expression. Notice that the linear form is equivalent to the original definition, but not necessarily textually identical: it tends to minimize parentheses. bar =: (+/) % # bar +/ % # Functions, that is, verbs, adverbs and conjunctions, are shown in the current representation. By contrast, nouns are always shown in the representation produced by the Format verb, regardless of the current option. Even though linear is current, we see: noun =: 'abc';'pqr' noun +---+---+ |abc|pqr| +---+---+
27.3.5 Parenthesized RepresentationThe parenthesized representation is like linear in showing a function as an expression. Unlike linear, the parenthesized form helpfully adds parentheses to make the logical structure of the expression more evident. (9!:3) 6 NB. parenthesized zot =: f @: g @: h zot (f@:g)@:h
27.3.6 Tree RepresentationTree representation is another way of displaying structure graphically: (9!:3) 4 NB. tree zot +- f +- @: -+- g -- @: -+- h 27.3.7 Atomic RepresentationFor completeness, the atomic representation is mentioned here. We will come back to it below. Before continuing, we return the current representation option to linear. (9!:3) 5
27.4 Representation FunctionsRegardless of the current option for showing representations on-screen, any desired representation may be generated as a noun by applying a suitable built-in verb. If y is a name with an assigned value, then a representation of y is a noun produced by applying one of the following verbs to the argument <'y' br =: 5!:2 NB. boxed lr =: 5!:5 NB. linear pr =: 5!:6 NB. parenthesized tr =: 5!:4 NB. tree ar =: 5!:1 NB. atomic For example, the boxed and parenthesized forms of zot are shown by:
We can get various representations of a noun, for example the boxed and the linear:
Representations produced by 5!:n are themselves nouns. The linear form of verb foo is a character-string of length 6.
The 6 characters of s represent an expression denoting a verb. To capture the verb expressed by string s, we could prefix the string with characters to make an assignment, and Execute the assignment.
27.4.1 Atomic RepresentationWe saw in Chapter 10 and Chapter 14 that it is useful to be able to form sequences of functions. By this we mean, not trains of verbs, but gerunds. A gerund, regarded as a sequence of verbs, can for example be indexed to find a verb applicable in a particular case of the argument. To be indexable, a sequence must be an array, a noun. Thus we are interested in transforming a verb into a noun representing that verb, and vice versa. A gerund is a list of such nouns, containing atomic representations. The atomic representation is suitable for this purpose because it has an inverse. None of the other representation functions have true inverses. The atomic representation of anything is a single box with inner structure. For an example, suppose that h is a verb defined as a hook. (A hook is about the simplest example of a verb with non-trivial structure.) h =: + % compare the boxed and the atomic representations of h
The inner structure is an encoding which allows the verb to be recovered from the noun efficiently without reparsing the original definition. It mirrors the internal form in which a definition is stored. It is NOT meant as yet another graphic display of structure. The encoding is described in the Dictionary. We will not go into much detail here. Very briefly, in this example we see that h is a hook (because 2 is an encoding of "hook") where the first verb is + and the second is %. The next example shows that we can generate atomic representations of a noun, a verb, an adverb or a conjunction. N =: 6 V =: h A =: / C =: &
27.4.2 Inverse of Atomic RepresentationThe inverse of representation is sometimes called "abstraction", (in the sense that for example a number is an abstract mathematical object represented by a numeral.) The inverse of atomic representation is 5!:0 which we can call ab. ab =: 5!:0 ab is an adverb, because it must be able to generate any of noun, verb, adverb or conjunction. For example, we see that the abstraction of the atomic representation of h is equal to h
and similarly for an argument of any type. For example for noun N or conjunction C
27.4.3 Summary of Representation Functions
27.4.4 Execute RevisitedHere is another example of the use of atomic representations. Recall that Execute evaluates strings expressing nouns but not verbs. Since Execute is itself a verb it cannot deliver verbs as its result.
To evaluate strings expressing values of any class we can define an adverb eval say, which delivers its result by abstracting an atomic representation of it. eval =: 1 : 0 ". 'w =. ' , u (ar < 'w') ab )
27.4.5 The Tie Conjunction RevisitedRecall from Chapter 14 that we form gerunds with the Tie conjunction `. Its arguments can be two verbs. G =: (+ %) ` h Its result is a list of atomic representations. To demonstrate, we choose one, say the first in the list, and abstract the verb.
The example shows that Tie can take arguments of expressions denoting verbs. By contrast, the atomic representation function (ar or 5!:1) must take a boxed name to identify its argument. Here is a conjunction T which, like Tie, can take verbs (not names) as arguments and produces atomic representations. T =: 2 : '(ar <''u.'') , (ar <''v.'')'
27.5 Conversions for Binary DataBinary data is, briefly, values represented compactly as character strings. Here we look at functions for converting between values in J arrays and binary data, with a view to handling files with binary data. Data files will be covered in Chapter 28 . In the following, a 32-bit PC is assumed, so it is assumed that a character occupies one byte and a floating point number occupies 8. A J array, of floating-point numbers for example, is stored in the memory of the computer. Storage is required to hold information about the type, rank and shape of the array, together with storage for each number in the array. Each floating-point number in the array needs 8 bytes of storage. There are built-in functions to convert a floating-point number to a character-string of length 8, and vice versa.
cf8 =: 2 & (3!:5) NB. float to 8 chars c8f =: _2 & (3!:5) NB. 8 chars to float In the following example, we see that the number n is floating-point, n is converted to give the string s which is of length 8, and s is converted back to give a floating-point number equal to n.
Characters in the result s are mostly non-printable. We can inspect the characters by locating them in the ASCII character-set: a. i. s 154 153 153 153 153 153 185 63 Now consider converting arrays of numbers. A list of numbers is converted to a single string, and vice versa::
The monadic rank of cf8 is infinite: cf8 applies just once to its whole argument. RANKS =: 1 : 'u b. 0' cf8 RANKS _ _ _ but the argument must be a scalar or list, or else an error results.
A floating-point number is convertible to 8 characters. There is an option to convert a float to and from a shorter 4-character string, sacrificing precision for economy of storage. cf4 =: 1 & (3!:5) NB. float to 4 chars c4f =: _1 & (3!:5) NB. 4 chars to float As we might expect, converting a float to 4 characters and back again can introduce a small error. p =: 3.14159265358979323
A J integer needs 4 bytes of storage. There are functions to convert between J integers and 4-character strings. ci4 =: 2 & (3!:4) NB. integer to 4 char c4i =: _2 & (3!:4) NB. 4 char to integer
We see that the length of s is 8 because s represents two integers. Suppose k is an integer and c is the conversion of k to 4 characters.
Since characters in c are mostly non-printable, we inspect them by viewing their locations in the ASCII alphabet. We see that the characters are the base-256 digits in the value of k, stored in c in the order least-significant first (on a PC)..
Integers in the range _32768 to 32767 can be converted to 2-character strings and vice versa. ci2 =: 1 & (3!:4) NB. integer to 2 char c2i =: _1 & (3!:4) NB. 2 char to int
Integers in the range 0 to 65535 can be converted to 2-character strings and vice versa. Such strings are described as "16bit unsigned". ui2 =: ci2 NB. integer to 2-char, unsigned u2i =: 0 & (3!:4) NB. 2 char to integer, unsigned
27.6 UnicodeIn this section we look at J support for Unicode. There are three kinds of character data in J.
The following diagram shows the J functions available for converting character data from one kind to another. The functions are members of the u: family. We have seen that J supports character data. For example C =: 'this is a string' The built-in verb 3 !: 0 shows the type of a data value. 3!:0 C 2 The result of 2 indicates that the data type of C is 8-bit characters, called "char". J also provides another data type with 16-bit characters, called "wchar" ("wide character"). The built-in function monadic u: converts char data to wchar. ] W =: u: C this is a string wchar data is displayed as before, but its data-type is shown as 131072 3!:0 W 131072 A 16-bit wchar character can be one of the many characters in the Unicode standard. The built-in function 4&u: produces a wchar character specified by the argument, which is an integer in the range 0-65536, called a Unicode "code point". A code point is often given as 4 hex digits. For example, the code point for the Greek letter alpha is hex 03b1 which we can write as 16b03b1 ] alpha =: 4&u: 16b03b1 α alpha is a wchar: 3!:0 alpha 131072 We can build a wchar string including alpha : ] U =: (u: 'the Greek letter alpha looks like this: ') , alpha the Greek letter alpha looks like this: α Suppose now that our wchar data U is to be exported, say by writing it to a data file . We will need to encode our 16-bit wchar data as a sequence of 8-bit bytes, according to some recognised standard encoding scheme. The UTF-8 standard is suitable. The built-in function 8&u: produces a character string which is a UTF-8 encoding of wchar data ] Z =: 8&u: U the Greek letter alpha looks like this: α We see that Z is of data type 2, (that is 8-bit char) and that the number of bytes in Z is one more than the number of characters in U, because alpha is encoded as two bytes.
The inverse of 8&u: is the built-in function 7&u: which produces wchar characters from a UTF-8 string. ] A =: 7&u: Z the Greek letter alpha looks like this: α We can view the Unicode code-points of the letters in A. The built-in function 3&u: produces code-point integers from wchar data. If we look at the last few characters of A, we see as we expect that the code-point integer of alpha is decimal 945, that is, hex 03b1. ] L =: _6 {. A NB. last few of A is: α 3 & u: L 105 115 58 32 32 945 This is the end of Chapter 27 |
The examples in this chapter
were executed using J version 802 beta.
This chapter last updated 19 May 2015
Copyright © Roger Stokes 2014.
This material may be freely reproduced,
provided that acknowledgement is made.
>> << Pri JfC LJ Phr Dic Voc !: Rel NuVoc wd Help Learning J