NYCJUG/2013-01-08

program readability, presenting code, spline, distance between points, analyze sentence, function composition, IDE versus command-line

Location:: BEST

Meeting Agenda for NYCJUG 20130108

1. Beginner's regatta: code readability and flexibility - an example of
finding the distance between two points: see "Distance Between Two Points
in J.pdf".


2. Show-and-tell: the usefulness of examples - see "Example Code for
Spline-fitting.doc".

Dissect: A graphical, 2-D J sentence analyzer; see "Introduction to Dissect
 - 2-D J Sentence Analyzer".


3. Advanced topics: See "The Benefits of Function Composition".

Using QT as the GUI for J?  Discuss.

Also, see "sketching a user interface with Denim.doc".


4. Learning, teaching and promoting J, et al.: who needs an IDE?  See
"The IDE As a Bad Programming Language Smell.doc".

Beginner's regatta

Code Readability: Distance Between Two Points in emacs Lisp

[From http://www.mitchellsoftwareengineering.com/ProgrammingWithGNUEmacsLisp.pdf]

"Consider a function linelen that computes the distance between two points represented as dotted-pairs:

ELISP> (linelen '(0 . 0) '(1 . 1))
1.4142135623730951

Definition:

(defun linelen (p1 p2)
(setq x1 (car p1))
(setq y1 (cdr p1))
(setq x2 (car p2))
(setq y2 (cdr p2))
(setq xdiff (- x2 x1))
(setq ydiff (- y2 y1))
(sqrt (+ (* xdiff xdiff) (* ydiff ydiff))) ; return value )

How would you rate the readability of the above code?"

Distance Between Two Points in J

Since we have arrays in J, there's no need to add to the complexity of the code by treating the elements of an (X,Y) pair as separate, unrelated scalars as is done above. Also, at the end of our exercise, we'll see how much more general and flexible the resulting array code can be.

Say we have two named point-sets like this:

   pts0=. 0 0
   pts1=. 1 1

The difference between the Xs and the Ys in the respective sets is simply

   pts0-pts1
 _1 _1

Squaring each of these:

   *:pts0-pts1
1 1

Summing the squares:

   +/*:pts0-pts1
2

Taking the square root of the sum of the squares:

   %:+/*:pts0-pts1
1.41421

So, if we want to encapsulate this logic in a dyadic, explicit verb:

   linelen=: 4 : '%:+/*:x-y'

Checking our work:

   pts0 linelen pts1
1.41421

How would you rate the readability of the above code?

Variations on a Theme

But, wait, is this the best way to code this? Consider first that "square" (*:) and "square-root" (%:) are inverses applied “under” summation (+/) and we have an adverb that does just this: "under" (&.). Also consider that we may want a tacit form of this verb and we may want it to be monadic. Taking these first two together and using the tacit translator “13 : “ as a crutch, we might try this:

   13 : '+/&.*:x-y'
[: +/&.*: -
   pts0 ([: +/&.*: -) pts1
1 1

It doesn’t work – why? Based on the shape of the result, we might suspect a problem with rank. Unfortunately, we can’t specify the rank of an adverb:

   pts0 ([: +/(&."_)*: -) pts1
|syntax error
|   pts0([:+/(    &."_)*:-)pts1

Abandoning this idea for the time being, let’s look instead at representing this tacitly as a monadic verb. An advantage of a monadic verb is that we can take a single argument – a 2 x n array of point pairs.

   ]pp=: pts0,:pts1
0 0
1 1
   ([:%:[:+/[:*:[:-/]) pp
1.41421

Notice how this requires us to use minus reduction (-/) instead of subtraction. Let’s extend this to more point pairs to see how it fares.

   ]pts=. (0 0,1 1,:2 2) ,: 1 1,2 2,:0 0
0 0
1 1
2 2

1 1
2 2
0 0

So we have three sets of point pairs in a 2 x 3 x 2 array. The pairs between which we'll find the distance are ((0,0),(1,1)), ((1,1),(2,2)), and ((2,2),(0,0)).

   ([:%:[:+/[:*:[:-/]) pts
2.44949 2.44949

This is obviously wrong because it's not the three answers we want – what went wrong? Let’s apply the verbs in an increasing train to see where we go astray.

   pts; (-/pts) ; (*:-/pts) ; +/*:-/pts
+---+-----+---+---+
|0 0|_1 _1|1 1|6 6|
|1 1|_1 _1|1 1|   |
|2 2| 2  2|4 4|   |
|   |     |   |   |
|1 1|     |   |   |
|2 2|     |   |   |
|0 0|     |   |   |
+---+-----+---+---+

Clearly we go wrong because the summation is down the first dimension. Fixing this gives us what we’re looking for:

   ([:%:[: (+/"1) [:*:[:-/]) pts
1.41421 1.41421 2.82843

We could move the reduction (/) outside the body of the function and make the minus (-) dyadic to make the resulting verb dyadic and apply it as a reduction:

   ([: %: [: +/"1 [: *: -) / pts
1.41421 1.41421 2.82843

Back to Basics and Beyond

Interestingly, when we search the J wiki for a solution to this problem, we find this (at http://www.jsoftware.com/jwiki/Phrases/Geometry#Distance_between_2_points) to calculate the distance between 2 points:

   NB.*dPP v distance between 2 points
   dPP=: [: +/ &.: *: -
   dPP/pp
1.41421

Notice that this succeeds with our early idea about applying summation "under" (&.:) a verb (*:), then applying that verb's inverse (%:) to the result. The key is this variant of "under" which is applied with infinite rank - &.: - instead of the more familiar version &..

However, this version suffers the same flaw as one of our earlier attempts when we attempt to extend it to multiple point pairs:

   dPP/pts
2.44949 2.44949

Fortunately, we can apply the same fix that worked before: specify the summation across 1-dimensional elements of the array:

   dPP=: [: (+/"1) &.: *: -

Applying this dyadic verb between the pair of points in pp:

   dPP/pts
1.41421 1.41421 2.82843

This also clues us in that the dyadic formulation may be more useful. In fact, it allows us the flexibility for all these useful variants and extensions:

   dPP"1 / / pts                   NB. All point pairs
1.41421 2.82843       0
   0 1.41421 1.41421
1.41421       0 2.82843

   2 dPP"1 / \ 0 0,1 1,2 2,:10 10  NB. Adjacent point pairs
1.41421 1.41421 11.3137

   dPP/0 0 0,:1 1 1                NB. 3-D points
1.73205
   dPP/0 0 0 0,:1 1 1 1            NB. 4-D points
2
   dPP/0 0 0 0 0,:1 1 1 1 1        NB. 5-D points
2.23607

How would you rate the generality and flexibility of this code?

Show-and-Tell

Code for Spline-fitting

This code, posted by J. Patrick Harrington, fits a curve to some points using spline methods. It’s slightly edited: the major change being to show the examples as “multi-line comments” using “0 : 0” instead of prefixing each comment line with “NB.” as this eases cut-and-paste of the examples.

NB.* ls_spline.ijs: find cubic spline fit by least squares.
NB. From http://www.astro.umd.edu/~jph/ls_spline.ijs by
NB. J. Patrick Harrington (jph@astro.umd.edu).
NB. Ref.: S K Lucas, Gazette Aust. Math. Soc., 30, 207 (2003).
NB. t= boundaries of cubic segments, (xi,yi) = data points.
NB. Usage: 't z zp'=. OUT=. t ls_spline xi;yi
NB. z and zp are values of spline fit f(x) and df/dx at t's.
NB. (z & zp are sufficient to evaluate the fit f(x) at any x.)

cos=: 2&o.
ls_spline=: dyad define
   'xi yi'=. /:~&.|: >y   NB. sort xi into an ascending sequence
   idx=. x IdX xi         NB. idx is t interval in which each xi resides
   NB. if first or last interval(s) are empty of data, delete them
   t=. ({. idx)}.((2-#x)+{:idx)}. x
   Np1=. 2+ Nm1=. <: N=. #h=. (}.-}:)t
   bx=. <"1 |: >(i. n=. #xi); t IdX xi
   v=. _1+ u=. bx{ h%"1~ xi -/ }:t
   uu=. *: u=. u bx}0$~n,N
   vv=. *: v=. v bx}0$~n,N
   aa=. +/ *:al=. (1+2*u)*vv
   ab=. +/ al*bt=. uu*(1-2*v)
   ag=. +/ al*gm=. h*"1 u*vv
   ad=. +/ al*dl=. h*"1 uu*v
   bd=. +/ bt*dl [bg=. +/ bt*gm [bb=. +/ *:bt
   dd=. +/ *:dl  [gd=. +/ gm*dl [gg=. +/ *:gm
   D=. +:(0,~ +/ yi*al)+(0, +/ yi*bt)
   G=. +: (0,~ +/ yi*gm)+(0, +/ yi*dl)
   bx11=. >: &.> bx00=. ;/ ,.~i. N
   bx01=. (0 1)&+ &.> bx00
   bx10=. (1 0)&+ &.> bx00
   A=. (bb bx11}E) + aa bx00}E=. 0$~,~Np1
   A=. +: (ab,ab) (bx01,bx10)}A
   B=. (ag bx00}E) + bd bx11}E
   B=. +: (bg,ad) (bx01,bx10)}B
   E=. (gg bx00}E) + dd bx11}E
   E=. +: (gd,gd) (bx01,bx10)}E
   bx00=. }: bx00 [ bx01=. }: bx01
   bx02=. (0 2)&+ &.> bx00
   hip=. }.h  [hi=. }:h
   C=. (3*hip%hi) bx00}F=. 0$~Nm1,Np1
   C=. (3*(hi%hip)-hip%hi) bx01}C
   C=. (_3*hi%hip) bx02}C
   F=. hip bx00}F
   F=. (+: hi+hip) bx01}F
   F=. hi bx02}F
NB. set up matrix of linear system to be solved:
   M=. (A,.(|:B),.|:C),(B,.E,.|:F),C,.F
   RHS=. (D,G,Nm1#0) NB. right hand side of system
   ZZ=. RHS %. M     NB. solve for f(x_i) & [df/dx]_i
   z=. (i. Np1){ZZ
   >t; z; zp=. (Np1+i. Np1){ZZ
)

We won't go over this code in any detail. We'll just note that it was copied, with little alteration, from a scalar language. In the highly-repetitive, adjacent lines above, we can hear the cries of an array struggling to get out.

The code continues with some elaboration.

NB. Thanks to Arie Groeneveld for improvements to this code.
NB. Note: equation (2) of Lucas should be u_i(x)=(x-t_i)/h_i
NB. and the 2nd B in the pseudocode matrix should not be B^T
NB. but just B as in equation (7).

NB. t IdX x0 --> index of where x0 fits in t-array:
NB.    0   1   2   3   4   5   6  <- t values
NB.    |   |   |   |   |   |   |
NB.  <-- 0 | 1 | 2 | 3 | 4 | 5 --->  IdX values
IdX=: [: <: (1"_ >. [: +/"1 ] >/ [) <. [: <: [: # [

NB. Least squares spline evaluation using output "OUT"
NB. of ls_spline. Usage: OUT eval_spline y --> f(y)
NB. The argument y may be an array of x-values.

eval_spline=: dyad define
   't z zp'=. x
NB. Array t holds the boundaries of the spline pieces, z is
NB. the array of the values of the cubic splines at the
NB. t points, and zp (z') the derivatives at those points.
   N=. # h=. (}.-}:) t
   v=. _1+ u=. h%~ |: y -/ }:t
   a=. (1+2*u)*v*v* }:z
   b=. u*u*(1-2*v)* }.z
   c=. h*u*v*v* }:zp
   d=. h*u*u*v* }.zp
   box=. <"1 |: >(i. #y); t IdX y
   box { ((#y),N)$,|: a+b+c+d
)

Since this code will be fitting a line to some points, the author defines a customized plot verb to show both the points and the line in the same graph.

NB. plot dots & line: (dots x;y) dot_and_line (line x;y)
require'plot'
dot_and_line=: dyad define
   pd 'output cairo 600 300' [ pd'reset'     NB. Size 300x600 pixels
   pd'type dot;pensize 4;color red'          NB. Distinguish dots as red
   pd x
   pd'type line;pensize 3; color blue'
   pd y
   pd'show'
)

I changed this slightly, adding the "output" statement to specify the plot size and distinguishing the lines from the dots by color differences.

The Unreasonable Effectiveness of Examples

Here's where J can really shine: if we're thoughtful enough to provide specific examples of using our code, it becomes trivially easy to do basic sanity tests and ensure that the code is doing what we expect.

Here we see an example formatted as a J noun (using "0 : 0") in order to keep the lines free of clutter. This simplifies cutting and pasting this example into a J session to watch it perform.

spline0_eguse_=: 0 : 0
   t=. 0, 0.2 0.5 0.7 1
   xx=. rand 35
   yy=. (cos 1p1*xx)+_0.05+0.1*rand 35
   'type symbol;pensize 3' plot xx;yy
   load 'ls_spline.ijs'
   ]OUT=. t ls_spline xx;yy
   uu=. int01 300
   vv=. OUT eval_spline uu
   (xx;yy) dot_and_line uu;vv
)

If we run this, we'll see a graph like this:

We see the curve fitted to the points (show as dots).

A "Classic" Example

Here, the author defines two utility verbs and provides some data that apparently has appeared frequently in the exposition of this algorithm.

rand=: [: ? ] # 0"_
int01=: i.@>:%]  NB. int01 n --> divides [0 ... 1] into n intervals.

NB. Test using classic "titanium heat data":
splineTi_eguse_=: 0 : 0
   ti=:    0.644 0.622 0.638 0.649 0.652 0.639 0.646 0.657 0.652 0.655
   ti=: ti,0.644 0.663 0.663 0.668 0.676 0.676 0.686 0.679 0.678 0.683
   ti=: ti,0.694 0.699 0.710 0.730 0.763 0.812 0.907 1.044 1.336 1.881
   ti=: ti,2.169 2.075 1.598 1.211 0.916 0.746 0.672 0.627 0.615 0.607
   ti=: ti,0.606 0.609 0.603 0.601 0.603 0.601 0.611 0.601 0.608

   xx=. 595+ 10*i.#ti
NB. 8 interval points "judiciously" chosen (for best fit!):
   t=. 595 820 850 870 900 915 980 1075
   OUT=. t ls_spline xx;ti
   uu=. 595+(1075-595)*int01 400
   vv=. OUT eval_spline uu
   (xx;ti) dot_and_line uu;vv  NB. this is the plot shown as "Fit ..."
)

If we run this example, we'll see the following:

An Alternative Method

The author presents an alternative algorithm to the above: a least-squares fit of points to "not quite a cubic spline".

NB. (see Ferguson and Staley, Comm. ACM 16, 380 (1973)).
NB. This routine is simpler than ls_spline above, since it doesn't enforce
NB. continuity of the second derivatives. The results usually look similar
NB. but are not so smooth. Given for comparison, but I don't recommend it.

Also useful in any code are explanations of the major inputs and outputs. We might also consider slightly more meaningful names, though there's no need to fall into the excess verbosity favored by some languages.

NB. t= frets, xi and yi data points.
NB. Usage: 't z zp'=. OUT2=. t ls_spline2 xi;yi
NB. z and zp are values of spline fit f(x) and df/dx at t's.

ls_spline2=: dyad define
   'xi yi'=. /:~&.|: >y   NB. sort xi into an ascending sequence
   idx=. x IdX xi
NB. if first or last interval(s) are empty of data, delete them
   t=. ({. idx)}.((2-#x)+{:idx)}. x
   Np1=. >: N=. #h=. (}.-}:)t
   bx=. <"1 |: >(i. n=. #xi); t IdX xi
   v=. _1+ u=. bx{ h%"1~ xi -/ }:t
   uu=. *: u=. u bx}0$~n,N
   vv=. *: v=. v bx}0$~n,N
   aa=. +/ *:al=. (1+2*u)*vv
   ab=. +/ al*bt=. uu*(1-2*v)
   ag=. +/ al*gm=. h*"1 u*vv
   ad=. +/ al*dl=. h*"1 uu*v
   bd=. +/ bt*dl [bg=. +/ bt*gm [bb=. +/ *:bt
   dd=. +/ *:dl  [gd=. +/ gm*dl [gg=. +/ *:gm
   D=. +:(0,~ +/ yi*al)+(0, +/ yi*bt)
   G=. +: (0,~ +/ yi*gm)+(0, +/ yi*dl)
   bx11=. >: &.> bx00=. ;/ ,.~i. N
   bx01=. (0 1)&+ &.> bx00
   bx10=. (1 0)&+ &.> bx00
   A=. (bb bx11}E) + aa bx00}E=. 0$~,~Np1
   A=. +: (ab,ab) (bx01,bx10)}A
   B=. (ag bx00}E) + bd bx11}E
   B=. +: (bg,ad) (bx01,bx10)}B
   E=. (gg bx00}E) + dd bx11}E
   E=. +: (gd,gd) (bx01,bx10)}E
   M=. (A,.|:B),(B,.E)
   ZZ=. (D,G)%. M  NB. solve for f(x_i) & [df/dx]_i
   z=. (i. Np1){ZZ
   >t; z; zp=. (Np1+i. Np1){ZZ
)

Again, an example or two is worth pages of exposition or code.

NB. Example:
spline2_eguse_=: 0 : 0
   t=. 0, 0.2 0.5 0.7 1
   xx=. rand 35
   yy=. (cos 1.5p1*xx)+_0.05+0.1*rand 35
   'type symbol;pensize 3' plot xx;yy
   load'ls_spline2.ijs'
   ]OUT2=. t ls_spline2 xx;yy
   uu=. int01 300
   vv2=. OUT2 eval_spline uu
   (xx;yy) dot_and_line uu;vv2
  NB. Compare to ls_spline from above:
   (xx;yy) dot_and_line uu;>vv;vv2
)

The results of the example of the alternative implementation look like this:

Comparing this alternative with the original version (by putting the data for both lines (>vv;vv2) in the last cell of the right argument) shows us that it's hard to distinguish the resulting lines. The alternative version overlays the original version nearly exactly except for the visible difference in the upper-left corner of this graph.

"Dissect" Tool for Analyzing J Sentences

See this page for information on this tool that breaks down a J expression into pieces as they are executed. We can take a look to see how this could have helped us with our exercise above where we were developing the code to find the distance between point pairs.

Advanced Topics

Qt

We talked a little about the recent switch by the J developers to using Qt as the basis for J graphics instead of Gtk. Thomas said, based on his experiences with both environments, that Qt seemed more comprehensive and more well-designed than Gtk. Based on messages on the forums, there still seems to be some teething pains with getting the Qt-based versions of the J GUI to work but development is proceeding rapidly.

Higher-level Functions

Unlike most other functional languages, J does not simply, vaguely refer to "higher-level" functions: we have a specific vocabulary for distinguishing between different kinds of "higher-level" functions. There are adverbs for combining verbs and nouns, and there are conjunctions for combining verbs. We also use the more general term "function composition" to refer to these different things together, as well as "hooks" and "forks" for different modes of composition.

Graham Parkhouse had this to say on the J-Programming Forum about the benefits of function composition in J.

The Benefits of Function Composition

from:	 Graham Parkhouse <graham.parkhouse@ntlworld.com>
to:	 programming@jsoftware.com
date:	 Thu, Jan 3, 2013 at 3:25 AM
subject: Re: [Jprogramming] Atop continues to puzzle me

Subject: Re: Atop continues to puzzle me

This post was initially titled 'The benefits of function composition' - see my P.S. There are things you can achieve with function composition that cannot be achieved so elegantly any other way:

   3 4 5<@#"0 i.3
+-----+-------+---------+
|0 0 0|1 1 1 1|2 2 2 2 2|
+-----+-------+---------+

This is what I want - a set of 3 0s, a set of 4 1s and a set of 5 2s.

This doesn't give me what I want:

   3 4 5#"0 i.3
0 0 0 0 0
1 1 1 1 0
2 2 2 2 2

Nor does this:

   3 4 5 ([: <"1 #"0) i.3
+---------+---------+---------+
|0 0 0 0 0|1 1 1 1 0|2 2 2 2 2|
+---------+---------+---------+

But this does:

   each
+--+-+
|&.|>|
+--+-+
   3 4 5#"0 each i.3
+-----+-------+---------+
|0 0 0|1 1 1 1|2 2 2 2 2|
+-----+-------+---------+

... but then &. is function composition.

The unavoidable problem with not using function composition in such instances is that J turns the result from every execution of a verb into a rectangular array. When a series of verbs is executed in succession without composition, J has to produce a succession of 'properly finished' results along the way. With composition, this finishing process is confined to just once at the end of the process.

P.S. I wrote this before reading Linda's recent post on 'atop continues to puzzle me' (Wed, 02 Jan 2013 03:02:54 -0500):

[At NYCJUG, Thomas pointed out that the "rank 0" ("0) in the last example above is unnecessary because "each" already works at this rank.]

Learning, teaching, and problem-solving

We looked at an essay about problems with IDEs (Interactive Development Environments) - The IDE As a Bad Programming Language Smell - compared with simpler, editor-based methods often preferred by more experienced programmers. Sometimes it seems as though IDEs exist to address common shortcomings in a language - think of the advantage of auto-complete in a verbose language like Java - but they potentially provide benefit for organizing large bodies of code.

Materials

-- Devon McCormick <<DateTime(2013-01-15T14:22:46-0200)>>