NYCJUG/2024-01-09
Beginner's regatta
We look at the dyadic case of %: which generalizes finding roots, like square roots, cube roots, and so on.
Generalizing Square Root
The monadic verb %: returns the square root of its argument(s).
%: 2 4 10 16 1.41421 2 3.16228 4
We can infer from the above example that this acts on scalars. We see that the dyadic case generalizes this to give us the xth root of y.
2 3 4 %: 10000 NB. Square root, cube root, fourth root of ten thousand 100 21.5443 10 2 3 4 %:/ 2 4 8 16 NB. Table of square, cube and fourth roots 1.41421 2 2.82843 4 1.25992 1.5874 2 2.51984 1.18921 1.41421 1.68179 2
The documentation for the dyadic case states that "Taking an integer root of an extended integer by an integer (extended or not) produces an extended integer if the quotient is an integer, or floating-point otherwise."
10 %: 10x^200 NB. Extended precision 100000000000000000000 10 %: 0 1+10x^200 NB. Both results floating point 1e20 1e20 25j2 ": 10 %: 0 1+10x^200 NB. Show floating point imprecision 100000000000000131072.00 100000000000000131072.00
We might wonder what happens if we try to find a fractional root. Unsurprisingly, J handles this case consistently.
1r2 1r3 1r4 %: 2 NB. Find the 1/2, 1/3, and 1/4 root of 2. 4 8 16
Taking the 1/n root of a number is the same as raising the number to the n power.
1r2 1r3 1r4 %: 1r2 1r3 1r4 NB. Starting to get silly but it works as expected. 1r4 1r27 1r256
Practical Example
We sometimes need to take higher-order roots for interest rate calculations. So, if I bought a house for $100,000 thirty years ago and it's now worth $400,000, what was my average annual rate of increase?
30 %: 4e5 % 1e5 1.04729
We see that it was about 4.73% per annum.
Show-and-tell
We look at a simple APL problem and some J equivalents of the APL solution, all done on a very old computer. We also see some initial progress on a project to re-write some Python code, using the Keras package, in J.
Drawing on an Old Computer
James Price is a fan of old computers and has developed games in APL for the Commodore Super Pet. He talked on the Arraycast podcast about this problem; the podcast is worth a listen. He does some of his retro-programming on real machines but also with emulators.
The Superpet, which debuted in about 1980, was the final entry in the Commodore PET line. It was
...known as the SuperPET or MicroMainframe. This machine was designed at the University of Waterloo for teaching programming. In addition to the basic CBM 8000 hardware, the 9000 added a second CPU in the form of the Motorola 6809, more RAM and included a number of programming languages including a BASIC in ROM for the 6502 and a separate ANSI Minimal BASIC-compatible BASIC for the 6809, along with APL, COBOL, FORTRAN, Pascal and a 6809 assembler on floppies.
James has a Medium essay on the following computing challenge to be done on a vintage machine.
The Challenge
Imagine the following on an old 25x80 low-res display:
**** VINTAGE COMPUTING **** CHRISTMAS CHALLENGE 2023 HVCCC2023 * * * READY. * * * * * * RUN * * * * * * * * * * * * * * * * JUST * * * * * * 4 FUN * * * * * * * * * * * * * * * ANY * * * * PLATFORM * * * * * * ANY * * * * * * LANGUAGE * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * HTTP://LOGIKER.COM/VCCC2023
The challenge is to produce a display of the lattice pattern drawn with asterisks as shown above.
An Aside: Display Characters Based on Boolean
Since the pattern can be represented by a Boolean matrix, the approach is to use the APL idiom ' *'[B] where B is a Boolean matrix with the points of the pattern indicated by the ones. (It's worth noting that Adám Brudzewsky extends this idea to display multiple characters using a matrix of integers in his seasonal talk where he uses APL to crudely simulate the illusion of snow falling.)
' *'[1+32 67⍴{~⊃⍵:⍵,∇(1⌽⍵)≠¯1⌽⍵⋄⍬}A] ⊣ A[34]←1 ⊣ A←67⍴0 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Compare this to the similar J expression:
' *' {~ '1' = (- |."_1 [: ": 2 | !/~) i._16 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
An Advantage of the J Version
One nice feature of the J expression is that it's easy to alter once you figure out that the final number controls the iterations and that number is a power of two.
' *' {~ '1' = (- |."_1 [: ": 2 | !/~) i._32 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Solving the Challenge
It's worth taking a look at how James solves the problem in APL, step-by-step. In parallel, we'll also mimic the APL in J.
He starts by creating a small identity matrix, like this in APL (and more succinctly in J):
APL | J |
---|---|
A←⌽(⍳4)∘.=⍳4 A 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 |
a=. |.=@i.4 a 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 |
This matrix represents the corner of one of the diamonds. We use this to get the top of the diamond by concatenating it to the flip of itself:
APL | J |
---|---|
B←(0 ¯1↓A),⌽A ' *'[1+B] * * * * * * * |
b=. a,.1}."1 |.a b{' *' * * * * * * * |
He proceeds from this point by generating the tops of three diamonds:
APL | J |
---|---|
C←K,(K←0 ¯1↓B),B ' *'[1+C] * * * * * * * * * * * * * * * * * * * |
c=. k,.(k=. _1}."1 b),.b c{' *' * * * * * * * * * * * * * * * * * * * |
He continues to build the diamond pattern by concatenating this pattern to the flip of itself:
APL | J |
---|---|
D←C,[1]⊖¯1 0↓C ' *'[1+D] * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * |
d=. c,|.}:c d{' *' * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * |
Finally, he completes the pattern by concatenating this pattern to itself (with the final row dropped to make them fit together seamlessly):
APL | J |
---|---|
' *'[1+Z,[1](Z←6 ¯19↑D),[1]D] * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * |
(z,(z=. }:d),d){' *' * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * |
Some Extras
The essay continues from here at some length by glomming together the above code fragments into a single 75 byte line and then showing how the display also can be built by using the (old) SuperPET APL's ⎕POKE function to write directly to memory locations.
The final solution and its result look like this on the SuperPET (emulator):
He also includes links to the code and an emulator in which to run it, along with instructions for loading and running APL under the emulator.
In the comments, someone posts a very short APL version of a solution using "magic" numbers.
APL | J |
---|---|
' *'[1+A∨⌽A←19 19⍴4=⍳6] * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * |
' *'{~(]+.|."1)19 19$3=i.6 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * |
Echo of a Similar Problem
In a talk I gave to a K group several years ago, I showed the solutions to a problem very similar to this in both APL and J.
Another Challenge
The slide showing the problem looks like this:
Numerous Solutions
Here, on one very crowded slide, we see solutions to this in APL, J, K, and q along with a detailed explanation of how the APL and J code works:
(A Start on) Re-writing Python Code in J
One of the basic problems solved with AI is the MNIST digits challenge. This uses a database of thousands of hand-written digits, from zero to nine, to train a model to recognize them.
The complete Python code to this is the following.
We import the tensorflow library with the designation tf and use it to read in an example of the MNIST dataset.
import tensorflow as tf data = tf.keras.datasets.fashion_mnist
We then break the dataset apart into training and testing groups where each group has labels designating the digit being shown and an image of the digit as a 28x28 numeric grayscale matrix; we scale the numbers to be between zero and one.
(training_images, training_labels), (test_images, test_labels) = data.load_data() training_images = training_images / 255.0 test_images = test_images / 255.0
The bulk of the work is in defining the model as shown here where each line defines a model layer. Each layer may be thought of as a sequential step through which each image passes.
model = tf.keras.models.Sequential([tf.keras.layers.Flatten(input_shape=(28,28)), tf.keras.layers.Dense(256, activation=tf.nn.relu), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(128, activation=tf.nn.relu), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(64, activation=tf.nn.relu), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation=tf.nn.softmax)])
We will cover these layers in more detail shortly but it's worth noting that there are only three tensorflow functions we need to replicate so it's not as daunting as it may first appear.
For completeness, we show how we compile this model, run it to train and evaluate it, then display some of its results.
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) model.fit(training_images, training_labels, epochs=20) model.evaluate(test_images, test_labels) classifications = model.predict(test_images) print(classifications[0]) print(test_labels[0])
The Data
The first hurdle is how to read in the dataset since Python uses some magic that is too opaque to readily replicate. Our solution to this is to read the data in Python and write it out in some simple format we can easily read in J. A simple character file should suffice as all the values can be mapped to 0 to 255. However, this is easier said than done for someone not adept in Python.
The following code works but it took hours to figure out.
def writeLabeledImage( label, image, fl): txt = [x for x in image] ln1 = reduce(add , [str(n) for n in reduce(add , [[","+str(n) for n in txt1] for txt1 in txt])]) fl.write(str(label)+ln1.join(':\n'))
Here's how we use it, assuming we've already read in the data as shown above.
fl = fopen('testImages.txt', 'a') sink = [writeLabeledImage( test_labels[nn], test_images[nn], fl) for nn in range(len(test_images))] fl.close()
This gives us a files with sizes of about 183 MB for the test set and 100 MB for the training set.
Writing J Equivalents of Keras Modules
We now look at the layers of the model and what we need to do to implement each in J.
The first function is Flatten, used like this: tf.keras.layers.Flatten(input_shape=(28,28)). This "flattens" the 28 by 28 matrix, i.e. ravels it, so we could define it in J something like this: flatten=: , .
However, the data we'll read has already been flattened so this is a no-op.
The next one, Dense, is the most interesting; this is where much of the heavy lifting gets done in training the model. Its invocation looks like this: tf.keras.layers.Dense(256, activation=tf.nn.relu) . The first parameter tells us the dimensionality of the output space so we expect 256 outputs. The other parameter specifies the activation function, relu, short for "Rectified linear unit activation". This is defined as f(x) = max(0, x), or, in J relu=: 0&>. .
The Dense function itself is defined as
Dense implements the operation: output = activation(dot(input, kernel) + bias) where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable if use_bias is True).
Note: If the input to the layer has a rank greater than 2, Dense computes the dot product between the inputs and the kernel along the last axis of the inputs and axis 0 of the kernel (using tf.tensordot).
So, how do we implement dot? Fortunately the Keras documentation on this is extensive and provides examples.
It takes a list of inputs of size 2, and the axes corresponding to each input along with the dot product is to be performed.
Let's say x and y are the two input tensors with shapes (2, 3, 5) and (2, 10, 3). The batch dimension should be of same size for both the inputs, and axes should correspond to the dimensions that have the same size in the corresponding inputs. e.g. with axes=(1, 2), the dot product of x, and y will result in a tensor with shape (2, 5, 10)
We can run the supplied Python example to see what the output should look like for a known input. Here are the inputs as specified in both Python and J.
Python | J |
---|---|
import numpy as np x = np.arange(10).reshape(1, 5, 2) y = np.arange(10, 20).reshape(1, 2, 5) x array([[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]]) y array([[[10, 11, 12, 13, 14], [15, 16, 17, 18, 19]]]) |
x=. 1 5 2$i.10 y=. 1 2 5$10+i.20 x 0 1 2 3 4 5 6 7 8 9 y 10 11 12 13 14 15 16 17 18 19 |
Dense Complications
It seems that Dense should simply be matrix multiply but the Python example expects three-dimensional inputs and has an axes specification.
tf.keras.layers.Dot(axes=(1, 2))([x, y]) <tf.Tensor: shape=(1, 2, 2), dtype=int32, numpy= array([[[260, 360], [320, 445]]])>
The arguments to axes are the axes which will get swallowed up by the inner product. In J, we would reshape the matrixes to ensure that the trailing axis of the left argument matches the leading axis of the right argument, shown here with two-dimensional arrays:
(|:5 2$i.10) +/ . * |:2 5$10+i.20 260 360 320 445
We could change the problem as we go along to make it more J-like but long experience has taught us not to do this when we are replicating code; it's better to mimic the source code as much as possible to help keep on track. So, we will figure out how to implement the axes function in J and proceed with the matrix multiply from there.
The documentation for dot explains how axes is used.
Dot class
keras.layers.Dot(axes, normalize=False, **kwargs)
Computes element-wise dot product of two tensors.
It takes a list of inputs of size 2, and the axes corresponding to each input along with the dot product is to be performed.
Let's say x and y are the two input tensors with shapes (2, 3, 5) and (2, 10, 3). The batch dimension should be of same size for both the inputs, and axes should correspond to the dimensions that have the same size in the corresponding inputs. e.g. with axes=(1, 2), the dot product of x, and y will result in a tensor with shape (2, 5, 10)
This is followed by the example Python code we have already seen above.
In J, we make two arrays congruent for matrix multiplication by re-arranging the axes. So, to work on axis 1 of our left argument x, we could use dyadic transpose like this:
$x 1 5 2 $1 0 2|:x 5 1 2
Similarly, to work on axis 2 of our right argument y:
$y 1 2 5 $2 0 1|:y 5 1 2
With some thought and experimentation, we came up with this for an axes verb:
axes=: ]|:~[({,-.~)[:i.[:#[:$] $&.>1 2 axes&.>x;y +-----+-----+ |5 1 2|5 1 2| +-----+-----+
This is not exactly how we want these to be shaped in J for our matrix multiply but we will take care of that within our dot cover function:
dot=: ] +/ .*~ [: |: [
We need to transpose the right argument its trailing axis matches the left arguments leading one.
Here's how it works:
(1 axes x) dot 2 axes y 260 360 320 445 $(1 axes x) dot 2 axes y 2 1 1 2
The leading axes of the three-dimensional inputs end up in the middle of our result, so we have to put some more thought into exactly how we want to implement this in J once we start playing around with actual data.
Further Work
This is all we've done so far but it puts us pretty close to being able to define and run a simple digits recognition model. We should see more about this in a future meeting.
Advanced topics
This article was sufficiently well-titled that I had to take a look at it: it's called The 4 Rules of Writing Code Only Mature Developers Know About (Steal These). [1].
The first point seemed germane to J programmers.
What this awkward phrasing means is that you should re-use code rather than putting it in a library and re-using the library. This seems to contradict conventional wisdom and is intended to do so.
In J, we often do re-use code that's simpler to write than to find in a library, something like "average" or "mean", for example.
The example given is writing a library with functions is_even and is_odd to distinguish between even and odd numbers. The author cautions against the library method where we define these two functions in a library then import that library to use them. The alternative the author proposes is to define these functions in the body of code that needs them, thus eliminating a dependency. The two versions of this example differ by only the line importing the functions from the library (in Python):
from odd_even_library import is_odd, is_even
It's difficult with such a simple example like this to decide the merit of the proposal. Of course the code written in Python is very verbose by J standards.
def is_odd(num): return num % 2 != 0 def is_even(num): return num % 2 == 0 # Main code number = 7 if is_odd(number): print(f"{number} is an odd number.") else: print(f"{number} is an even number.")
J Version
Leaving aside the gaffe of using a global to pass an argument, how might we render this code in J?
oddOrEven=: 3 : '(":y),'' is an '','' number.'',~;(2|y){''even'';''odd''' oddOrEven 7 7 is an odd number. oddOrEven 8 8 is an even number.
Testing this on a vector reveals a bug:
oddOrEven 7 8 7 8 is oddeven.
We see that this should be written to apply to scalars by adding "0 to the end of the definition.
oddOrEven=: 3 : '(":y),'' is an '','' number.'',~;(2|y){''even'';''odd''' oddOrEven 7 8 7 is an odd number. 8 is an even number.
However, further testing reveals this bug:
oddOrEven 7.8 |domain error in oddOrEven, executing dyad { |x is 1.8; not an integer | (":y),' is an ',' number.',~;(2|y) {'even';'odd'
A Hack
We could accommodate the non-integer case with a bit of lawyering: since an even number is one that is integrally divisible by two, an odd one is not.
oddOrEven=: 3 : '(":y),'' is an '','' number.'',~;(0=2|y){''odd'';''even'''"0 oddOrEven 7.8 7.8 is an odd number.
To be fair, the Python code fails on the vector (list) argument but gives an incorrect answer for a non-integer argument.
number = 7.5 if is_odd(number): print(f"{number} is an odd number.") else: print(f"{number} is an even number.") 7.5 is an odd number. number = [7,8] if is_odd(number): print(f"{number} is an odd number.") else: print(f"{number} is an even number.") ... ... ... ... Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 2, in is_odd TypeError: unsupported operand type(s) for %: 'list' and 'int'
So, to write these tests correctly, we should at least handle the non-integer case correctly which begs the question of whether it ought to be in a library after all, at least in the Python case.
However, the TL;DR is
If you are creating a library and you don’t know at that moment how you are going to use it in future. Don’t create it.
This seems like reasonable advice.
2. Using the domain language
This is a plea to avoid generic meaningless names in favor of names recognizable as part of the domain for which we are coding. This seems reasonable as well but the example given is somewhat bizarre.
The example using generic language looks like this:
def calculate(a, b): x = a + b y = x * 2 z = y / 3 return z result = calculate(5, 7) print("Result:", result)
Whereas the domain-specific language example looks like this:
def calculate_sum_and_average(first_number, second_number): sum_of_numbers = first_number + second_number double_sum = sum_of_numbers * 2 average = double_sum / 3 return average result = calculate_sum_and_average(5, 7) print("Average:", result)
The irony here is that the function does not return a sum or an average: the names are in fact mis-leading. This could lead us to discuss the value of avoiding names as much as possible but we will continue with the essay.
3. Write a function with one level of abstraction
This section appears to contradict the first section given this example of failing to add a level of abstraction:
def calculate_rectangle_area(length, width): return length * width # Function to ask the user for dimensions def get_and_calculate_rectangle_area(): length = float(input("Enter the length of the rectangle: ")) width = float(input("Enter the width of the rectangle: ")) area = calculate_rectangle_area(length, width) print(f"The area of the rectangle is: {area}") def main(): print("Welcome to the Rectangle Area Calculator") get_and_calculate_rectangle_area()
The criticism of this code is that the calculation and getting the input are part of the same function. The suggested improvement looks like this:
# Function to calculate the area def calculate_rectangle_area(length, width): return length * width def main(): print("Welcome to the Rectangle Area Calculator") length = float(input("Enter the length of the rectangle: ")) width = float(input("Enter the width of the rectangle: ")) area = calculate_rectangle_area(length, width) print(f"The area of the rectangle is: {area}")
This suggestion seems to be edging us toward a more functional style of programming but, as we mention above, it somewhat contradicts the initial suggestion about putting code in libraries.
As it turns out, this apparent contradiction is by design.
4. There is no fixed answer in programming
The author points out the tension between principle 1 and principle 3 as an example of there being no hard and fast rules of programming.
Learning and Teaching J
Think Like a Programmer
There is a recent effort in the J community to introduce people to how a J programmer thinks when approaching a problem. It's called Share My Screen and has several entries so far. There are entries for some of the Project Euler problems as well as Advent of Code. Since Project Euler has a competitive element to it, the problem being solved is not shown though a solution is offered.
This project is along the same lines as our Show and Tell section in these meetings so we may be able to contribute from the NYCJUG meeting notes.
In any case, each of you should think about contributing to this effort.
End Notes
The 4 Rules of Writing Code Only Mature Developers Know About
Steal These
By Sanjay Priyadarshi / Level Up Coding / Nov 7, 2023