Help / Learning / Ch 27: Representations and Conversions

From J Wiki
Jump to navigation Jump to search


>> << Pri JfC LJ Phr Dic Voc !: Rel NuVoc wd Help Learning J


Chapter 27: Representations and Conversions

In this chapter we look at various transformations of functions and data.

27.1 Classes and Types

If we are transforming things into other things, it is useful to begin with functions which tell us what sort of thing we are dealing with.

27.1.1 Classes

Given an assignment, name =: something, then something is an expression denoting a noun or a verb or an adverb or a conjunction. That is, there are 4 classes to which something may belong.

There is a built-in verb 4!:0 which here we can call class.

   class =: 4!:0

We can discover the class of something by applying class to the argument <'name'. For example,

n =: 6 class < 'n'
6 0

The result of 0 for the class of n means that n is a noun. The cases are:

          0  noun
          1  adverb
          2  conjunction
          3  verb

and two more cases: the string 'n' is not a valid name, or n is valid as a name but no value is assigned to n.

         _2  invalid
         _1  unassigned

For example:

C =: & class <'C' class <'yup' class <'1+2'
& 2 _1 _2

The argument of class identifies the object of interest by quoting its name to make a string, such as 'C'.

Why is the argument not simply the object? Because, by the very purpose of the class function, the object may be a verb, noun, adverb or conjunction, and an adverb or conjunction cannot be supplied as argument to any other function.

Why not? Suppose the object of interest is the conjunction C. No matter how class is defined, whether verb or adverb, any expression of the form (class C) or (C class) is a bident or a syntax error. In no case is function class applied to argument C. Hence the need to identify C by quoting its name.

27.1.2 Types

A noun may be an array of integers, or of floating-point numbers or of characters, and so on. The type of any array may be discovered by applying the built-in verb 3!:0. For example

(3!:0) 0.1 (3!:0) 'abc'
8 2

The result of 8 means floating-point and the result 2 means character. Some of the possible cases for the result are:

1 boolean
2 character (that is, 8-bit characters), also called "literal" in the Dictionary
4 integer
8 floating point
16 complex
32 boxed
64 extended integer
128 rational
65536 symbol
131072 wide character (16-bit)

There is also a useful verb datatype in the standard library which produces not a number but a name for the type of its argument.

datatype 0.1 datatype 'abc'
floating literal

27.2 Execute

There is a built-in verb ". (doublequote dot, called "Execute"). Its argument is a character-string representing a valid J expression, and the result is the value of that expression.

   ". '1+2'
3

The string can represent an assignment, and the assignment is executed:

". 'w =: 1 + 2' w
3 3

If the string represents a verb or adverb or conjunction, the result is null, because Execute is itself a verb and therefore its results must be nouns. However we can successfully Execute assignments to get functions.

". '+' ". 'f =: +' f
    +

27.3 Representations

When an expression is entered at the keyboard, a value is computed and displayed on-screen. More precisely, it is a representation of that value which is displayed, as a sequence of characters.

For example, if we define a function foo:

   foo =: +/ % #

and then view the definition of foo:

   foo
+-----+-+-+
|+-+-+|%|#|
||+|/|| | |
|+-+-+| | |
+-----+-+-+

we see on the screen some representation of foo. What we see depends on which option is chosen from several possibilities. The default option for the representation of a function is

 as a boxed structure, as in the example of foo above.

There are other possibilities, which we look at next.

27.3.1 Representations of Nouns for Display

For a noun of type character, its value can be represented to the user simply by the J interpreter writing its characters to the screen (with line-breaks at appropriate places) .

By contrast, for nouns of other datatypes - numbers, or boxes, or symbols, a displayable representation require converting the data value to characters.

For this purpose there is a built-in function ": (double-quote colon, called Format). We have already met Format in Chapter 19 where it was used for formatting numbers.

The Format verb produces nouns which are character representations looking identical to the argument:

   ] n1 =: 'toujours' ; 'l''audace'
+--------+--------+
|toujours|l'audace|
+--------+--------+
   
   ] r1 =:  ": n1   NB. a representation of n1
+--------+--------+
|toujours|l'audace|
+--------+--------+
   

but n1 and its representation r1 are of different datatypes, and different dimensions.

datatype n1 datatype r1 $ n1 $ r1
boxed literal 2 3 19

27.3.2 Drawing Boxes with the Format Verb

The Format ": verb does three things:
  • It produces a character representation of its argument
  • It allows control over the way that numbers are shown. See Chapter 19 .
  • It allows control over the way that boxes are shown. We look at this next.

We can specify the characters used for drawing boxes. In this book I have drawn boxes with '+ | -' characters because only these can I rely on to be correctly rendered in all web-browsers. However, in a J session, you will probably see boxes drawn with other characters, giving a more pleasing appearance.

The box-drawing characters used by the Format verb are specified by a global parameter. The value of this parameter can be inspected with 9!:6 :

   9!:6 ''
+++++++++|-
   

and we see the eleven characters currently in effect. The parameter can be set with 9!:7 . Here is a verb which Formats the right argument with box-drawing characters given by the left argument. Notice that it saves and restores the global parameter.

   fmt =: 4 : 0   
    assert. 11 = # x
    t =. 9!:6 ''   NB. save current box-draw chars
    9 !:7 x        NB. set box-draw chars to new value
    z =. ": y
    9 !: 7 t       NB. restore original box-draw chars
    z
)

To show which of the eleven characters goes where, we can draw boxes with the characters 0123456789A

   ] w =: 2 2 $ < '    '
+----+----+
|    |    |
+----+----+
|    |    |
+----+----+
   
   '0123456789A' fmt w
0AAAA1AAAA2
9    9    9
3AAAA4AAAA5
9    9    9
6AAAA7AAAA8
   

Dyadic Format allows control over where the data is placed in the box. For details see the Dictionary

27.3.3 Representations of Functions

There are several options for producing representations of functions,

that is, representations of verbs, adverbs or conjunctions.

By default the current option is the "boxed representation", so we see the verb foo (defined above) depicted graphically as a structure of boxes.

   foo
+-----+-+-+
|+-+-+|%|#|
||+|/|| | |
|+-+-+| | |
+-----+-+-+

Other options are available, described below. To select and make current an option for representing functions on-screen, enter one of the following expressions:

            (9!:3) 2  NB. boxed (default)
            (9!:3) 5  NB. linear
            (9!:3) 6  NB. parenthesized
            (9!:3) 4  NB. tree
            (9!:3) 1  NB. atomic

The current option remains in effect until we choose a different option.

27.3.4 Linear Representation

If we choose the the linear representation, and look at foo again:

   (9!:3) 5  NB. linear 

   foo
+/ % #

we see foo in a form in which it could be typed in at the keyboard, that is, as an expression.

Notice that the linear form is equivalent to the original definition, but not necessarily textually identical: it tends to minimize parentheses.

   bar =: (+/) % #
   
   bar
+/ % #

Functions, that is, verbs, adverbs and conjunctions,

are shown in the current representation.

By contrast, nouns are always shown in the representation produced by the Format verb, regardless of the current option. Even though linear is current, we see:

   noun =: 'abc';'pqr'
   
   noun
+---+---+
|abc|pqr|
+---+---+

27.3.5 Parenthesized Representation

The parenthesized representation is like linear in showing a function as an expression. Unlike linear, the parenthesized form helpfully adds parentheses to make the logical structure of the expression more evident.

   (9!:3) 6  NB. parenthesized

   
   zot =: f @: g @: h
   
   zot
(f@:g)@:h

27.3.6 Tree Representation

Tree representation is another way of displaying structure graphically:

   (9!:3) 4  NB. tree

   zot
              +- f
       +- @: -+- g
-- @: -+- h       
   

27.3.7 Atomic Representation

For completeness, the atomic representation is mentioned here. We will come back to it below.

Before continuing, we return the current representation option to linear.

   (9!:3) 5

27.4 Representation Functions

Regardless of the current option for showing representations on-screen, any desired representation may be generated as a noun by applying a suitable built-in verb.

If y is a name with an assigned value, then a representation of y is a noun produced by applying one of the following verbs to the argument <'y'

   br =:  5!:2    NB. boxed 
   lr =:  5!:5    NB. linear
   pr =:  5!:6    NB. parenthesized
   tr =:  5!:4    NB. tree
   ar =:  5!:1    NB. atomic

For example, the boxed and parenthesized forms of zot are shown by:

br < 'zot' pr < 'zot'
+--------+--+-+

|+-+--+-+|@:|h|
||f|@:|g||  | |
|+-+--+-+|  | |

+--------+--+-+
(f@:g)@:h

We can get various representations of a noun, for example the boxed and the linear:

br <'noun' lr <'noun'
+---+---+

|abc|pqr|

+---+---+
<;._1 ' abc pqr'

Representations produced by 5!:n are themselves nouns. The linear form of verb foo is a character-string of length 6.

foo s =: lr <'foo' $ s
+/ % # +/ % # 6

The 6 characters of s represent an expression denoting a verb. To capture the verb expressed by string s, we could prefix the string with characters to make an assignment, and Execute the assignment.

s $ s a =: 'f =: ' , s ". a f 1 2
+/ % # 6 f =: +/ % #   1.5

27.4.1 Atomic Representation

We saw in Chapter 10 and Chapter 14 that it is useful to be able to form sequences of functions. By this we mean, not trains of verbs, but gerunds. A gerund, regarded as a sequence of verbs, can for example be indexed to find a verb applicable in a particular case of the argument.

To be indexable, a sequence must be an array, a noun. Thus we are interested in transforming a verb into a noun representing that verb, and vice versa. A gerund is a list of such nouns, containing atomic representations. The atomic representation is suitable for this purpose because it has an inverse. None of the other representation functions have true inverses.

The atomic representation of anything is a single box with inner structure. For an example, suppose that h is a verb defined as a hook. (A hook is about the simplest example of a verb with non-trivial structure.)

   h =: + %

compare the boxed and the atomic representations of h

br <'h' ar < 'h'
+-+-+

|+|%|

+-+-+
+---------+

|+-+-----+|
||2|+-+-+||
|| ||+|%|||
|| |+-+-+||
|+-+-----+|

+---------+

The inner structure is an encoding which allows the verb to be recovered from the noun efficiently without reparsing the original definition. It mirrors the internal form in which a definition is stored. It is NOT meant as yet another graphic display of structure.

The encoding is described in the Dictionary. We will not go into much detail here. Very briefly, in this example we see that h is a hook (because 2 is an encoding of "hook") where the first verb is + and the second is %.

The next example shows that we can generate atomic representations of a noun, a verb, an adverb or a conjunction.

   N =: 6
   V =: h
   A =: /
   C =: &

ar <'N' ar <'V' ar <'A' ar <'C'
+-----+

|+-+-+|
||0|6||
|+-+-+|

+-----+
+-+

|h|

+-+
+-+

|/|

+-+
+-+

|&|

+-+

27.4.2 Inverse of Atomic Representation

The inverse of representation is sometimes called "abstraction", (in the sense that for example a number is an abstract mathematical object represented by a numeral.) The inverse of atomic representation is 5!:0 which we can call ab.

   ab =: 5!:0

ab is an adverb, because it must be able to generate any of noun, verb, adverb or conjunction. For example, we see that the abstraction of the atomic representation of h is equal to h

h r =: ar < 'h' r ab
+ % +---------+

|+-+-----+|
||2|+-+-+||
|| ||+|%|||
|| |+-+-+||
|+-+-----+|

+---------+
+ %

and similarly for an argument of any type. For example for noun N or conjunction C

N rN=: ar <'N' rN ab C (ar <'C') ab
6 +-----+

|+-+-+|
||0|6||
|+-+-+|

+-----+
6 & &

27.4.3 Summary of Representation Functions

Representation
Function
applied to
noun
applied to
verb, adverb or conj
Format verb, monadic ": result is always character array

controlled by global parameters for


box-drawing characters etc
not applicable
automatic display in response to entering expression display is result of format verb
controlled by global parameters
display is result of
format verb applied to
result of current repr. function (5!:n) as chosen by (9!:3) n
boxed repr. verb 5!:2 result is boxed
linear repr. verb 5!:5 result is executable character string
parenthesized repr. verb 5!:6 result is executable character string with more parentheses
tree repr. verb 5!:4 not appropriate result is character array depicting structure
atomic repr. verb 5!:1 result is a boxed structure

27.4.4 Execute Revisited

Here is another example of the use of atomic representations. Recall that Execute evaluates strings expressing nouns but not verbs. Since Execute is itself a verb it cannot deliver verbs as its result.

". '1+2' ". '+'
3  

To evaluate strings expressing values of any class we can define an adverb eval say, which delivers its result by abstracting an atomic representation of it.

   eval =: 1 : 0
". 'w =. ' , u
(ar < 'w') ab
)
   

'1+2' eval mean =: '+/ % #' eval mean 1 2
3 +/ % # 1.5

27.4.5 The Tie Conjunction Revisited

Recall from Chapter 14 that we form gerunds with the Tie conjunction `. Its arguments can be two verbs.

   G =: (+ %) ` h  

Its result is a list of atomic representations. To demonstrate, we choose one, say the first in the list, and abstract the verb.

G r =: 0 { G r ab
+---------+-+

|+-+-----+|h|
||2|+-+-+|| |
|| ||+|%||| |
|| |+-+-+|| |
|+-+-----+| |

+---------+-+
+---------+

|+-+-----+|
||2|+-+-+||
|| ||+|%|||
|| |+-+-+||
|+-+-----+|

+---------+
+ %

The example shows that Tie can take arguments of expressions denoting verbs. By contrast, the atomic representation function (ar or 5!:1) must take a boxed name to identify its argument.

Here is a conjunction T which, like Tie, can take verbs (not names) as arguments and produces atomic representations.

   T =: 2 : '(ar <''u.'') , (ar <''v.'')'
   

(+ %) T h (+ %) ` h
+---------+-+

|+-+-----+|h|
||2|+-+-+|| |
|| ||+|%||| |
|| |+-+-+|| |
|+-+-----+| |

+---------+-+
+---------+-+

|+-+-----+|h|
||2|+-+-+|| |
|| ||+|%||| |
|| |+-+-+|| |
|+-+-----+| |

+---------+-+

27.5 Conversions for Binary Data

Binary data is, briefly, values represented compactly as character strings. Here we look at functions for converting between values in J arrays and binary data, with a view to handling files with binary data. Data files will be covered in Chapter 28 .

In the following, a 32-bit PC is assumed, so it is assumed that a character occupies one byte and a floating point number occupies 8.

A J array, of floating-point numbers for example, is stored in the memory of the computer. Storage is required to hold information about the type, rank and shape of the array, together with storage for each number in the array. Each floating-point number in the array needs 8 bytes of storage.

There are built-in functions to convert a floating-point number to a character-string of length 8, and vice versa.

   cf8 =:   2 & (3!:5)   NB. float to 8 chars
   c8f =:  _2 & (3!:5)   NB. 8 chars to float 

In the following example, we see that the number n is floating-point, n is converted to give the string s which is of length 8, and s is converted back to give a floating-point number equal to n.

n =: 0.1 $ s =: cf8 n c8f s
0.1 8 0.1

Characters in the result s are mostly non-printable. We can inspect the characters by locating them in the ASCII character-set:

   a. i. s 
154 153 153 153 153 153 185 63

Now consider converting arrays of numbers. A list of numbers is converted to a single string, and vice versa::

a =: 0.1 0.1 $ s =: cf8 a c8f s
0.1 0.1 16 0.1 0.1

The monadic rank of cf8 is infinite: cf8 applies just once to its whole argument.

   RANKS =: 1 : 'u b. 0'
   cf8 RANKS
_ _ _

but the argument must be a scalar or list, or else an error results.

b =: 2 2 $ a $ w =: cf8 b $ w =: cf8"1 b
0.1 0.1
0.1 0.1
error 2 16

A floating-point number is convertible to 8 characters. There is an option to convert a float to and from a shorter 4-character string, sacrificing precision for economy of storage.

   cf4 =:  1 & (3!:5)   NB. float to 4 chars
   c4f =: _1 & (3!:5)   NB. 4 chars to float

As we might expect,

converting a float

to 4 characters and back again can introduce a small error.

   p =: 3.14159265358979323
   

p $ z =: cf4 p q =: c4f z p - q
3.14159 4 3.14159 _8.74228e_8

A J integer needs 4 bytes of storage. There are functions to convert between J integers and 4-character strings.

   ci4 =:  2 & (3!:4)  NB. integer to 4 char
   c4i =: _2 & (3!:4)  NB. 4 char  to integer
   

i =: 1 _100 $ s =: ci4 i c4i s
1 _100 8 1 _100

We see that the length of s is 8 because s represents two integers.

Suppose k is an integer and c is the conversion of k to 4 characters.

k =: 256+65 $ c =: ci4 k
321 4

Since characters in c are mostly non-printable, we inspect them by viewing their locations in the ASCII alphabet. We see that the characters are the base-256 digits in the value of k, stored in c in the order least-significant first (on a PC)..

k a. i. c 256 256 256 256 #: k
321 65 1 0 0 0 0 1 65

Integers in the range _32768 to 32767 can be converted to 2-character strings and vice versa.

   ci2 =:  1 & (3!:4)  NB. integer to 2 char
   c2i =: _1 & (3!:4)  NB. 2 char  to int
   

i $ s =: ci2 i c2i s
1 _100 4 1 _100

Integers in the range 0 to 65535 can be converted to 2-character strings and vice versa. Such strings are described as "16bit unsigned".

   ui2 =: ci2         NB. integer to 2-char,  unsigned  
   u2i =: 0 & (3!:4)  NB. 2 char  to integer, unsigned
   

m =: 65535 $ s =: ui2 m u2i s
65535 2 65535

27.6 Unicode

In this section we look at J support for Unicode.

There are three kinds of character data in J.

  • Ordinary character data we have seen already as 8-bit ASCII
  • 16-bit characters, called "wide characters" for Unicode.
  • Sequences of 8-bit characters, which represent Unicode characters, for the purpose of writing Unicode in files. This representation is called the UTF-8 encoding.

The following diagram shows the J functions available for converting character data from one kind to another. The functions are members of the u: family.

Help-unicode.svg

We have seen that J supports character data. For example

   C =: 'this is a string'

The built-in verb 3 !: 0 shows the type of a data value.

   3!:0  C
2

The result of 2 indicates that the data type of C

is 8-bit characters, called "char".

J also provides another data type with 16-bit characters, called "wchar" ("wide character"). The built-in function monadic u: converts char data to wchar.

   ] W =: u: C
this is a string

wchar data is displayed as before, but its data-type is shown as 131072

   3!:0 W
131072

A 16-bit wchar character can be one of the many characters in the Unicode standard. The built-in function 4&u: produces a wchar character specified by the argument, which is an integer in the range 0-65536, called a Unicode "code point".

A code point is often given as 4 hex digits. For example, the code point for the Greek letter alpha is hex 03b1 which we can write as 16b03b1

   ] alpha =: 4&u: 16b03b1
α

alpha is a wchar:

   3!:0  alpha
131072

We can build a wchar string including alpha :

   ] U =: (u: 'the Greek letter alpha looks like this:  ') , alpha
the Greek letter alpha looks like this:  α
    

Suppose now that our wchar data U is to be exported, say by writing it to a data file . We will need to encode our 16-bit wchar data as a sequence of 8-bit bytes, according to some recognised standard encoding scheme. The UTF-8 standard is suitable.

The built-in function 8&u: produces a character string which is a UTF-8 encoding of wchar data

   ] Z =: 8&u: U
the Greek letter alpha looks like this:  α

We see that Z is of data type 2, (that is 8-bit char) and that the number of bytes in Z is one more than the number of characters in U, because alpha is encoded as two bytes.

3!:0 Z # U # Z
2 42 43

The inverse of 8&u: is the built-in function 7&u: which produces wchar characters from a UTF-8 string.

   ] A =: 7&u: Z
the Greek letter alpha looks like this:  α

We can view the Unicode code-points of the letters in A. The built-in function 3&u: produces code-point integers from wchar data. If we look at the last few characters of A, we see as we expect that the code-point integer of alpha is decimal 945, that is, hex 03b1.

   ] L =:  _6 {. A      NB. last few of A
is:  α
   
   3 & u: L
105 115 58 32 32 945
   

This is the end of Chapter 27


NEXT
Table of Contents
Index


The examples in this chapter were executed using J version 802 beta. This chapter last updated 19 May 2015
Copyright © Roger Stokes 2014. This material may be freely reproduced, provided that acknowledgement is made.


>> << Pri JfC LJ Phr Dic Voc !: Rel NuVoc wd Help Learning J