User:Ian Clark/credo
Artificial Catholicism: a case study in translating from BASIC to J
In the following thread on the J programming forum:
http://www.jsoftware.com/pipermail/programming/2012-November/030164.html
Bo Jacoby challenges us to make sense of a heritage BASIC program which generates doctrinally sound prayers from a database of the Nicene Creed in Latin:
http://en.wikipedia.org/wiki/Nicene_Creed#Latin_liturgical_version
The database stores each word of the Creed as the node of a tree. "Doctrinally sound prayers" correspond to traversals of this tree, or a subtree.
This article is not concerned with the devotional questions surrounding automatic prayer generation, but with offering a case-study of analysing a given BASIC program in order to implement it intelligently in J.
The BASIC program in question is interesting for its technique of defining a tree on a set of records by attaching a single integer to each record. A given traversal of the tree or a subtree has an associated integer, and conversely any positive integer defines a traversal (albeit often a trivial one). The program loops indefinitely, accepting integers typed-in at the keyboard, and outputting the corresponding prayer on the same line.
A typical session looks like this
1:CREDO AMEN 12:CREDO IN JESUM AMEN 10:CREDO IN DEUM ET IN JESUM ET IN SPIRITUM ET ECCLESIAM AMEN 2:CONFITEOR AMEN 20:CONFITEOR BAPTISMA AMEN 200:CONFITEOR UNUM BAPTISMA IN REMISSIONEM AMEN 201:CONFITEOR UNUM BAPTISMA AMEN 210:CONFITEOR UNUM BAPTISMA IN REMISSIONEM AMEN 211:CONFITEOR UNUM BAPTISMA AMEN
the substring before the colon (:) being the integer typed-in.
The data file "CREDO"
We reproduce Bo's database of fundamental Catholic doctrine below (you can copy/paste it into an empty IJS window):
1 CREDO 11 IN 111 UNUM 11 DEUM 112 PATREM 1121 OMNIPOTENTEM 113 FACTOREM 1131 CÆLI 1139 ET 1132 TERRÆ 11331 VISIBILIUM 1133 OMNIUM 11339 ET 11332 INVISIBILIUM 19 ET 12 IN 1211 UNUM 1211 DOMINUM 12 JESUM 1211 CHRISTUM 1212 FILIUM 1212 DEI 12121 UNIGENITUM 1219 ET 1213 EX 1213 PATRE 1213 NATUM 12131 ANTE 121311 OMNIA 12131 SÆCULA 1221 DEUM 12211 DE 12211 DEO 1222 LUMEN 12221 DE 12221 LUMINE 1223 DEUM 12231 VERUM 12232 DE 12232 DEO 122321 VERO 1231 GENITUM 12311 NON 12311 FACTUM 1232 CONSUBSTANTIALEM 1232 PATRI 12321 PER 12321 QUEM 12321 OMNIA 12321 FACTA 12321 SUNT 124 QUI 124101 PROPTER 124101 NOS 12410101 HOMINES 124109 ET 124102 PROPTER 12410201 NOSTRAM 124102 SALUTEM 12411 DESCENDIT 1241101 DE 1241101 CÆLIS 12419 ET 12412 INCARNATUS 12412 EST 1241201 DE 1241201 SPIRITU 124120101 SANCTO 1241202 EX 1241202 MARIA 124120201 VIRGINE 12419 ET 1241301 HOMO 12413 FACTUS 12413 EST 124211 CRUCIFIXUS 1242101 ETIAM 1242101 PRO 1242101 NOBIS 1242102 SUB 1242102 PONTIO 1242102 PILATO 124212 PASSUS 124219 ET 124213 SEPULTUS 12421 EST 12429 ET 12422 RESURREXIT 124221 TERTIA 124221 DIE 124222 SECUNDUM 124222 SCRIPTURAS 12429 ET 12423 ASCENDIT 124231 IN 124231 CÆLUM 12424 SEDET 124241 AD 124241 DEXTERAM 124241 PATRIS 12429 ET 124251 ITERUM 12425 VENTURUS 12425 EST 124252 CUM 124252 GLORIA 124253 JUDICARE 1242531 VIVOS 1242539 ET 1242532 MORTUOS 125 CUJUS 125 REGNI 125 NON 125 ERIT 125 FINIS 19 ET 13 IN 13 SPIRITUM 131 SANCTUM 132 DOMINUM 139 ET 133 VIVIFICANTEM 134 QUI 134 EX 1341 PATRE 1342 FILIO 1349 QUE 134 PROCEDIT 135 QUI 135 CUM 13501 PATRE 13509 ET 13502 FILIO 13509 SIMUL 1351 ADORATUR 1359 ET 1352 GLORIFICATUR 136 QUI 136 LOCUTUS EST 1361 PER 1361 PROPHETAS 19 ET 141 UNAM 142 SANCTAM 143 CATHOLICAM 149 ET 144 APOSTOLICAM 14 ECCLESIAM 2 CONFITEOR 211 UNUM 21 BAPTISMA 212 IN 212 REMISSIONEM 2121 PECCATORUM 9 ET 3 EXPECTO 31 RESURRECTIONEM 311 MORTUORUM 39 ET 32 VITAM 3211 VENTURI 321 SÆCULI AMEN
To facilitate loading the data, let's make the above into a J script by prepending a single header line, like so
CREDO=: (<;._2) 0 : 0 1 CREDO 11 IN 111 UNUM 11 DEUM ... 39 ET 32 VITAM 3211 VENTURI 321 SÆCULI AMEN )
Note that the final line consisting of a bare right-parenthesis is redundant if it's the last line of the script.
We can either save the data as a separate J script, or embed it into the application script which is going to use it.
When loaded, the above script will create a noun CREDO - a boxed list of records like this:
CREDO ┌───────┬─────┬────────┬───────┬──────────┬─────────────────┬────────────┬... │1 CREDO│11 IN│111 UNUM│11 DEUM│112 PATREM│1121 OMNIPOTENTEM│113 FACTOREM│... └───────┴─────┴────────┴───────┴──────────┴─────────────────┴────────────┴... #CREDO 163
The program to generate doctrinally sound prayers
Bo offers a heritage BASIC program which reads the data in its original bare form (saved as a text file named "CREDO") and composes a "prayer" (my term for it) when the user types in a number. The program loops, accepting numbers from the keyboard until the user enters a null input.
1 INPUT;C$: IF C$="" THEN END 2 OPEN "CREDO" FOR INPUT AS 1: PRINT":"; 3 IF EOF(1) THEN CLOSE: PRINT: GOTO 1 4 LINE INPUT#1,A$: B$=C$ 5 IF A$=""THEN A%=-1 ELSE A%=ASC(A$)-48: A$=MID$(A$,2) 6 IF B$=""THEN B%=-1 ELSE B%=ASC(B$)-48: B$=MID$(B$,2) 7 IF A%<0 THEN PRINT" ";A$;: GOTO 3 8 IF A%=0 OR B%=0 OR A%=B% THEN 5 ELSE 3
The program makes no attempt to separate its code into input, processing and output. Even in the 1960s when BASIC was invented (Kemeny & Kurtz, 1964) this was recognised to be a bad thing. It makes a puzzle out of something which was originally intended to be easy to read and understand, viz. BASIC itself.
Now there's little point in translating a puzzle written in BASIC into the same puzzle written in J. Far better to purge the algorithm of its BASIC idiosyncracies, and express it in a way which makes it convenient to program "intelligently" in J. By which we mean in a way which shows some grasp of what the original program did, and how it did it, making it possible to choose "the J way" of doing things as opposed to "the BASIC way".
We won't strive for the most efficient J code, preferring code which clarifies the algorithm, whilst leaving the way open to tighten it up at our leisure (though maybe at the cost of clarity). But when we've finished, there will be little or no sign of the BASIC origins of the program.
Addressed in this way, Bo's sample program makes an appealing case-study for re-implementing an application in J.
The traditional input/processing/output cycle
BASIC programs were originally run on a shared mainframe and accessed by the user via a teletype. If the flow-of control drops off the end, the program stops and so does your session. You'd probably have to phone the computer operator to mount the tape again and schedule a new job for you, charging your department accordingly. You don't want that to happen. So if the program is meant to take a number as input and generate a Latin sentence as output, you write it to cycle indefinitely, accepting input until you call a halt, conventionally done by entering an empty line.
This is the purpose of line 1:
1 INPUT;C$: IF C$="" THEN END
The computer puts the characters you enter into a variable called C$. (The final $ denotes a string variable). If C$ is empty, the computer terminates the session by executing END.
If C$ is not empty, the computer executes line 2, which, we may guess, starts reading the data file "CREDO" from the beginning. It also prints a colon in preparation for eventual output. To save paper this is going to be on the same line as your input. This is how teleprinters were used before computers came along.
And it's not hard to guess the next line (3) specifies what's to happen when the end-of-file (EOF) of "CREDO" is reached. The file is closed, PRINT is executed to start a new line, and flow-of-control goes back to the start of the program (line 1) to ask you for your next number. But a lot happens before that takes place. Typically the flow-of-control drops through to line 4 without doing anything, and line 4 starts the real work by reading the next record from the data file.
Let's represent this behaviour with a J verb like this:
go=: 3 : 0 while. 1 do. if. 0=#C=: 1!:1 (1) do. '...exits' return. end. smoutput pray C end. )
The heart of the app is the verb pray, which accepts a number and returns a string (the Latin sentence). Nowadays you have exclusive use of your computer and your session is not going to stop when execution stops. So, writing an app for your own use you wouldn't bother to write a verb like go. You'd run pray directly like this:
pray 21 CONFITEOR BAPTISMA AMEN pray 22 CONFITEOR AMEN pray 31 EXPECTO RESURRECTIONEM AMEN
The "processing" part of the program
What does the BASIC program actually do?
It reads the file "CREDO" in sequence, record by record. Each record consists of an ID (a number) and a Latin word. It matches the ID with the number you typed-in (let's call it the key), and if they match it prints the word. Otherwise it goes on to the next word. That's all it does.
How would you naturally do it in J?
Suppose you had a dyad verb called match. The x-arg is a given ID, the y-arg is the key, and it returns 1 if they match, 0 if they don't. If you gave this verb Rank 1, it would automatically generalise to accepting a list of IDs, and the result would be a list of 0s and 1s, one for each corresponding word. You could then use Copy (#) with this Boolean vector to extract those words you want to include in the sentence from the list of all words in the data file. In this way you'd separate processing from output.
Firstly, let's split the data in CREDO into a list of IDs and a corresponding list of words:
CS=: > CREDO -.each <' 0123456789' NB. words only CN=: > CREDO -.each <'ABCDEFGHIJKLMNOPQRSTUVWXYZÆ' NB. keys only CN ; CS ┌──────────┬────────────────┐ │1 │CREDO │ │11 │IN │ │111 │UNUM │ │11 │DEUM │ ... ... ... │39 │ET │ │32 │VITAM │ │3211 │VENTURI │ │321 │SÆCULI │ │ │AMEN │ └──────────┴────────────────┘
Now we can define our verb: pray like this:
sel=: 3 : 'CN match"1 ({:$CN){.":y' sel 11 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... pray=: 3 : 'deb , CS #~ sel y'
Verb: sel is basically just CN match y -- but this assumes match has rank 1, and y is in character form and is the same width as CN. The full definition, as shown above, ensures all of these things.
Verb: pray is basically just (sel y)#CS -- but that's a matrix having the same number of columns as CS. The full definition, as shown above, turns it into a string of words and removes duplicated spaces.
Intuiting and verifying the ID "match" algorithm
All that remains is to define our verb: match.
This is not so easy. We can guess that lines 4 to 8 of the BASIC program implements the matching algorithm, and it does a digit-by-digit comparison. But it may work by side-effects, so here's a case where we'll have to attempt a faithful transliteration of the BASIC code into J. Then, using this to verify what the algorithm actually does, we can write an alternative definition of match in efficient J code.
We'll call these two versions of match: matchX and matchY, and plug them in turn into: sel using one or other of the sentences:
match=: matchX match=: matchY
Here's the attempted transliteration...
matchX=: 4 : 0 NB. test database line x against searchkey y NB. a transliteration of original algorithm NB. 1 INPUT;C$: IF C$="" THEN END NB. 2 OPEN "CREDO" FOR INPUT AS 1: PRINT":"; NB. 3 IF EOF(1) THEN CLOSE: PRINT: GOTO 1 NB. ...implemented by word: go (below) NB. 4 LINE INPUT#1,A$: B$=C$ As=. dtb x NB. A$ - word read-in from data base Bs=. dtb y NB. B$ - test key while. 1 do. NB. 5 IF A$=""THEN A%=-1 ELSE A%=ASC(A$)-48: A$=MID$(A$,2) if. As-:'' do. An=._1 NB. An is original A% else. An=. ".{.As NB. ASC(A$) only translates the first char of A$ As=. }.As NB. MID$() without 3rd arg is like RIGHT$() end. NB. 6 IF B$=""THEN B%=-1 ELSE B%=ASC(B$)-48: B$=MID$(B$,2) if. Bs-:'' do. Bn=._1 NB. Bn is original B% else. Bn=. ".{.Bs Bs=. }.Bs end. NB. 7 IF A%<0 THEN PRINT" ";A$;: GOTO 3 if. An<0 do. 1 return. end. NB. i.e. accept the word NB. 8 IF A%=0 OR B%=0 OR A%=B% THEN 5 ELSE 3 if. (An=0) or (Bn=0) or (An=Bn) do. NB. (no-op) loop to test next letter else. 0 return. NB. i.e. reject the word end. end. )
and here's an equivalent version in somewhat more efficient J:
matchY=: [: *./ = +. ('0' = [) +. ('0' = ]) +. ' ' = [
An "explication" of matchY (an equivalent explicit definition) is as follows:
ZE=: '0' SP=: ' ' or=: +. all=: *./ matchY=: 4 : 'all ((x=y) or (x=ZE) or (y=ZE) or (x=SP))'
The tacit form of matchY can probably be made more efficient still with a better use of forks. We leave it as it is here, because it corresponds to what you get if you use: 13 : in place of: 4 : in the above explication.
Conclusion
Here's a script combining all the things discussed:
(Contributed by Ian Clark.)