NYCJUG/2023-07-11
Beginner's regatta
Some time ago, I learned that the most frequently-used letters in the English language are "etaoinshrdlu", in that order. More recently, I decided to check this by analyzing some large bodies of text. Here we look at how to tabulate letter frequencies for various bodies of text.
Input Files
Here is a list of the files used in this study. They consist of various files from the internet, including lists of quotations and sayings.
d:/amisc/txt/LargeTxt: total used in directory 45185 available 706.8 GiB drwxrwxrwx 1 devon devon 65536 07-03 00:15 .. drwxrwxrwx 1 devon devon 28672 07-03 00:16 . -rw-rw-rw- 1 devon devon 777176 06-28 23:44 AnObliqueApproach.txt -rw-rw-rw- 1 devon devon 212946 2006-10-05 Badge of Infamy by Lester del Ray.txt -rw-rw-rw- 1 devon devon 426056 06-28 23:44 BerserkerThrone.txt -rw-rw-rw- 1 devon devon 4573 2020-09-01 clipsToSinkLyinDonny.txt -rw-rw-rw- 1 devon devon 122049 06-29 02:43 CommonSense.txt -rw-rw-rw- 1 devon devon 23181 2014-12-24 DaveBarryNYC.txt -rw-rw-rw- 1 devon devon 218565 2007-12-28 Flatland by Abbot_flat10a.txt -rw-rw-rw- 1 devon devon 766690 2002-03-13 fortunes2.txt -rw-rw-rw- 1 devon devon 173831 2007-03-04 fortunesNew.txt -rw-rw-rw- 1 devon devon 343627 2007-12-28 Four-Day Planet by H. Beam Piper-19478-8.txt -rw-rw-rw- 1 devon devon 1053964 06-28 23:48 Freehold.txt -rw-rw-rw- 1 devon devon 1530492 2003-01-23 freethought.txt -rw-rw-rw- 1 devon devon 65336 2007-12-28 Gambler's World by John Keith Laumer_21627.txt -rw-rw-rw- 1 devon devon 133035 2007-12-28 Greylorn by John Keith Laumer_23028.txt -rw-rw-rw- 1 devon devon 18223 2015-09-25 herbsInfo.txt -rw-rw-rw- 1 devon devon 11543 2008-05-30 internetSlang.txt -rw-rw-rw- 1 devon devon 9031 2022-09-07 Jokes2.txt -rw-rw-rw- 1 devon devon 35822 2020-10-06 ListOfWhatHeDid.txt -rw-rw-rw- 1 devon devon 59082 2002-04-25 mathFortune.txt -rw-rw-rw- 1 devon devon 10871 2019-03-27 Names Notable of Obscure and Imaginary.txt -rw-rw-rw- 1 devon devon 118304 2008-01-04 Omnilingual by H Beam Piper_19445-8.txt -rw-rw-rw- 1 devon devon 827394 06-28 23:56 Original_Edition_of_edited_Schmitz_Stories.txt -rw-rw-rw- 1 devon devon 220921 2002-04-25 plan9fortunes.txt -rw-rw-rw- 1 devon devon 339819 2008-01-04 Planet of the Damned by Harry Harrison_21873-8.txt -rw-rw-rw- 1 devon devon 124502 2012-01-06 programmingQuotes.txt -rw-rw-rw- 1 devon devon 640987 06-28 23:58 Pyramid_Scheme.txt -rw-rw-rw- 1 devon devon 347739 2022-12-03 quotes.bbs -rw-rw-rw- 1 devon devon 1169116 06-28 23:59 Retief.txt -rw-rw-rw- 1 devon devon 398836 2008-01-04 The Devolutionist and the Emancipatrix by Homer Eon Flint_thdvl10.txt -rw-rw-rw- 1 devon devon 582469 06-28 23:54 TheGrantvilleGazetteVol1.txt -rw-rw-rw- 1 devon devon 153471 06-29 02:46 The Mountains of Mourning.txt -rw-rw-rw- 1 devon devon 44363 2007-12-28 The Stoker and the Stars by Algirdas Jonas Budrys_22967.txt -rw-rw-rw- 1 devon devon 204350 2007-12-28 The Ultimate Weapon by John Wood Campbell_23790-8.txt -rw-rw-rw- 1 devon devon 10040 2009-02-13 theWayOfTheKook.txt -rw-rw-rw- 1 devon devon 52970 2007-12-28 The Yillian Way by John Keith Laumer_21782.txt -rw-rw-rw- 1 devon devon 50401 2008-01-04 Traders Risk by (pseud) Roger Dee_23103.txt -rw-rw-rw- 1 devon devon 344327 2007-12-28 Uller Uprising by H. Beam Piper_19474-8.txt -rw-rw-rw- 1 devon devon 525368 2018-03-15 wlist_match10.txt -rw-rw-rw- 1 devon devon 375030 2018-03-15 wlist_match11.txt -rw-rw-rw- 1 devon devon 231516 2018-03-15 wlist_match12.txt -rw-rw-rw- 1 devon devon 14278764 2018-03-15 wlist_match1.txt -rw-rw-rw- 1 devon devon 5163818 2018-03-15 wlist_match2.txt -rw-rw-rw- 1 devon devon 3319961 2018-03-15 wlist_match3.txt -rw-rw-rw- 1 devon devon 2479169 2018-03-15 wlist_match4.txt -rw-rw-rw- 1 devon devon 1950487 2018-03-15 wlist_match5.txt -rw-rw-rw- 1 devon devon 1548879 2018-03-15 wlist_match6.txt -rw-rw-rw- 1 devon devon 1230525 2018-03-15 wlist_match7.txt -rw-rw-rw- 1 devon devon 956272 2018-03-15 wlist_match8.txt -rw-rw-rw- 1 devon devon 733180 2018-03-15 wlist_match9.txt -rw-rw-rw- 1 devon devon 1749989 2002-02-04 WORD.LST -rw-rw-rw- 1 devon devon 5966 06-29 23:59 WorksAndDays.txt
Most of the last group of files, from "wlist_match10.txt" to "WORD.LST", are comprehensive lists of words in alphabetical order. All the other files are texts of various novels from the Gutenberg Project or various list of jokes and sayings.
Reading Data
Here we assign a variable dd to be the directory prefix of all the files underneath it, assuming they are all word files we expect. We use the standard library function tolower to ensure there are only lower-case versions of each letter.
dd=. 'D:\amisc\txt\LargeTxt\' #txt=. tolower ;fread&.>(<dd),&.>0{"1 dir dd,'*.*' 46175026
So txt is a vector of more than 46 million lower-case letters from all these files. These letters correspond to the first 26 items in the standard J global variable Alpha_j_.
Build Frequency Table
First, set our alphabet.
]letts=. 26}.Alpha_j_ abcdefghijklmnopqrstuvwxyz
Now build the frequency table.
frq=. <:+/"1 letts ([ =/ ,) txt
Breaking down this phrase, we see that the rightmost expression gives us a large table of Boolean values based on comparing the alphabet to the alphabet concatenated to the text. We precede the text by the alphabet to ensure we get results in the order corresponding to the alphabet, i.e. the first row corresponds to "a", the next to "b", and so on.
$letts ([ =/ ,) txt 26 46175052 10 40{.letts ([ =/ ,) txt 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Notice the diagonal of ones at the start of this table. This reflects the prepending of the alphabet.
We see that the shape of the table is #letts by #txt, so it's rather large. Summing each row gives us the number of occurrences of each letter but we have to subtract one to account for the extra letters of the prepended alphabet.
Efficiency
Note that this method does not scale well as it produces a large intermediate result before the summation. How long does this take?
(10) 6!:2 '<:+/"1 letts ([ =/ ,) txt' 0.337766 10{.<:+/"1 letts ([ =/ ,) txt 3522382 790968 1438743 1397585 4278072 595216 991186 1370755 3086360 123338
However, we know a more efficient way to calculate these frequencies using J's key operator /. as seen here.
(10) 6!:2 '<:#/.~ letts,txt' 0.0479608 10{. <:#/.~ letts,txt 3522382 790968 1438743 1397585 4278072 595216 991186 1370755 3086360 123338
This is much faster and should scale better for a larger base of text. As with the other method, we prepend the alphabet to ensure the order of the results correspond to it and subtract one from the result to compensate for the extra set of prepended letters.
Showing Results
Once we have calculated the frequencies of the letters,
frq=. <:#/.~ letts,txt
we want to display a table of results, something like this:
$mat=. (<"0]letts),.<"0 frq |length error, executing dyad ,. |shapes 26 and 153 have different numbers of items | $mat=.(<"0]letts) ,.<"0 frq
What's this problem? We see that we have a lot more than 26 frequencies.
$frq 153
Looking past the first 26, we see many non-alphabetic characters.
26}.~. letts,txt .,©1984037:-65&2—;*'()"!?[]/ï#_$ %@ `=~|<>\+^ �{� ᧴└ 挎 ���├┬┐┌ Ͻ ɬ
These various characters reflect numerals, punctuation, symbols, and the residue of special characters, perhaps reflecting a sprinkling of Unicode characters. Let's restrict ourselves to only the first 26; we use the tally of letts to allow us to generalize this in the future if we want to count the non-alphabetic characters.
frq=. (#letts){.<:#/.~ letts,txt $mat=. (<"0]letts),.<"0 frq 26 2 3{.mat +-+-------+ |a|3522382| +-+-------+ |b|790968 | +-+-------+ |c|1438743| +-+-------+ +/frq 39231892 (#txt)-+/frq 6943134
So, we have a two-column table mat of letters and their respective frequencies. Let's format this for display. First, insert commas into the long numbers for ease of reading. Fortunately, we have a personal utility to do this.
'*omma*' names 3 commaFmtNum commaFmtNum 3 : 0 0.01 commaFmtNum y : }.;(' ',' '-.~])&.>('c0.',":>.10^.%x) 8!:0 y NB.EG commaFmtNum 1234 123456.78 12.1 111222333 NB. 1,234.00 123,456.78 12.10 111,222,330.00 NB.EG 0 commaFmtNum 1234 123456.78 12.1 111222333 NB. 1,234 123,457 12 111,222,330 ) 1 commaFmtNum 5{.frq 3,522,382 790,968 1,438,743 1,397,585 4,278,072 $mat=. (<"0]letts),.1 commaFmtNum &.><"0 frq 26 2 3{.mat +-+---------+ |a|3,522,382| +-+---------+ |b|790,968 | +-+---------+ |c|1,438,743| +-+---------+
Now we need to fix the alignment of the numbers to ensure all are right-aligned.
rightAlign=: 13 : '(->./#&>y){.&.>y' NB. Negative overtake by longest number $mat=. (<"0]letts),.rightAlign 1 commaFmtNum &.><"0 frq 26 2 3{.mat +-+---------+ |a|3,522,382| +-+---------+ |b| 790,968| +-+---------+ |c|1,438,743| +-+---------+
Final Result
Rather than display this table in alphabetical order, it's more useful to order by the letter frequency, hence the \: frq below.
Here's what we found:
(;"1 ' ',&.>":&.>mat) \: frq e 4,278,072 a 3,522,382 i 3,086,360 s 2,913,377 n 2,738,458 o 2,709,298 r 2,676,211 t 2,654,758 l 2,003,495 c 1,438,743 d 1,397,585 h 1,370,755 u 1,269,965 m 1,174,851 g 991,186 p 961,640 b 790,968 y 702,045 f 595,216 k 559,218 w 489,051 v 417,154 z 185,436 j 123,338 x 115,249 q 67,081
Looking at these numbers, it seems there is a relatively big drop off between m and g, so, comparing the more popular letters from this sample against the 'etaionshrdlu' from long ago, we find they differ primarily by the letter c moving up in the rankings.
#'etaoinshrdlu' 12 #'eaisnortlcdhu' 13 'eaisnortlcdhu'-.'etaoinshrdlu' c
Also, t has moved down in the ranking and o and i have swapped popularity; this sample also appears to be heavy on vowels.
Show-and-tell
We look at some J equivalents of some of the common Python tensorflow libraries which are used to construct neural networks.
Edge Detection
One of the techniques for image processing used by neural networks is edge detection.
Here is a description of detecting vertical edges using a reducing filter (window):
The leftmost table represents pixel values; the middle 3x3 table is a filter that accentuates vertical lines. The result of this convolution is shown on the right.
This filter is essentially looking at changes in the second derivative horizontally.
Below, we replicate this specific example in J.
]m0=. 3 0 1 2 7 4,1 5 8 9 3 1,2 7 2 5 1 3,0 1 3 1 7 8,4 2 1 6 2 8,:2 4 5 2 3 9 3 0 1 2 7 4 1 5 8 9 3 1 2 7 2 5 1 3 0 1 3 1 7 8 4 2 1 6 2 8 2 4 5 2 3 9 ]cmED=. |:_3]\3#1 0 _1 NB. convolution matrix for (vertical) Edge Detection 1 0 _1 1 0 _1 1 0 _1 convolute=: 13 : '+/,x * y' (1 1,:3 3)(cmED&convolute;._3) m0 _5 _4 0 8 _10 _2 2 3 0 _2 _4 _7 _3 _2 _3 _16 equivs=. 0 : 0 Python: conv_forward tensorflow: tf.nn.conv2d keras: Conv2D )
Advanced topics
We take a brief look at the new sorting method discovered by AI and explore the wonderful array of options Python has for the simple problem of adding together numeric arrays.
AI Sorting
Recently there was news about a new faster sorting algorithm "discovered" by AI but there is some reason to think this may be a bit over-hyped.
According to this article, researchers using Google's DeepMind AI have discovered a new sorting algorithm that is up to 70% faster for small sequences - say five numbers - and 1.7% faster for large sequences, say 250,000 numbers. The researchers claim that this “will transform the foundations of computing.”
Caveats
First of all, this is not an "algorithm" in the traditional sense: DeepMind was given a set of assembly language instructions which it combined in novel ways to come up with a set of instructions to sort more quickly; it apparently takes advantage of a CPU's internal branch prediction shortcuts. So, there is apparently no sequence of steps one could express in a high-level language or in pseudo-code to implement this new "algorithm" in an arbitrary language.
The result of this exercise is a set of assembly language instructions which have been incorporated into Google's Abseil library of C++ functions, available here.
However, there is at least one dissenting view from our old friend Arthur Whitney.
From: Arthur Whitney <a@shakti.com> Date: Sun, Jul 2, 2023 at 3:56 PM Subject: [shakti] AI - we may be safe for a few more days To: k <k@k.topicbox.com>, shaktidb <shaktidb@googlegroups.com>
https://thenewstack.io/googles-deepmind-extends-ai-with-faster-sort-algorithms/
>Google DeepMind searched for a faster sorting algorithm using an >AI system and [claim] "will transform the foundations of computing." really? (see below)
> Google emphasized that sorting algorithms affect billions of people every day yes
> further improvements on the efficiency of these routines has proved challenging apparently
> sequences with over 250,000 elements, the results were still 1.7% faster mazeltov?
>And this isnt just an abstract exercise. Google has already made the code open source, >uploading it into LLVMs main library for standard C++ functions mazeltov?
they added 10,000,000 bytes of code to the garbage pile (llvm12-13) -rw-r--r-- 1 root root 47554640 Feb 4 2022 llvm-12/lib/libclang-cpp.so.12 -rw-r--r-- 1 root root 57844128 Mar 15 2022 llvm-13/lib/libclang-cpp.so.13
>Google proudly points out that millions of developers and companies around the world now use it on AI applications morons
nanoseconds per element to sort 250000 random uint32 (skylake/zen2/m1)
numpy.sort 62 libc++-12 60.7 libc++-13 60.3
and the A-team k 2
i.e. x:250000__1e9 \t:4 ^x p.s. this is why i never use libraries
How Fast is J's Sort?
Using Arthur's "per item" measure above, here's what I see for J on my machine (Intel i9-10900F CPU @ 2.80GHz):
#rand=. 250000?@$<:2^31 250000 3!:0]rand NB. Ensure we have integers 4 (100) 6!:2 '/:~rand' NB. Run 100 times for good measure 0.00439556 0.00439556%25e4 NB. s/element 1.75822e_8 NB. About 18 nanoseconds/element
Of course this is not really comparable to Arthur's timings because he is running on a different machine but J still looks pretty good running on a two-year old consumer grade Windows PC.
Also, I could not come up with comparable timings using the publicly available C++ libraries because I could not put in the time to make that stuff work. It's just too hard, especially compared to something like the above.
An Edge Condition
During our meeting discussion about the pitfalls of sorting, John brought up the fact that Java implements sorting using HeapSort which, while not necessarily the fastest, has good behavior in the sense that it takes a similar amount of time for any sort. This contrasts with QuickSort, often thought to be one of the fastest algorithms but has worst-case behavior that's much worse than average. In fact, QuickSort does very poorly on data that is almost sorted.
With this in mind, I thought to test J's sort performance on a vector that is almost sorted.
#rand=. /:~250000?@$<:2^31 250000
So, rand is completely sorted but let's scramble 100 items to be out of order:
ixs=. 100?#rand rand=. (|.ixs{rand) ixs}rand NB. almost sorted
Now rand is sorted except for 100 elements that are out of order. How well does J perform on this?
(100) 6!:2 '/:~rand' 0.00240286 0.00240286%25e4 9.61144e_9
It actually runs nearly twice as fast as our earlier test with completely randomly-distributed numbers, taking about 10 nanoseconds per item.
Silly Python
I have been taking some machine-learning courses which use Python libraries like keras to implement neural networks. Along the way, I've learned a few of the more obscure things about Python. Many of these features seem odd and unnecessarily complex when we are used to the simplicity of J's arrays and syntax, but we assume they have their own logics which will become more clear as we get more acquainted with them.
Using an Assigned Variable on the Same Line
When I was doing one of the assignments, I had to create two variables, one of which was simply a constant added to the other variable, something we might write this way in J:
a=. 1+b=.99
Trying to do a similar thing in Python give us this:
a=1+b=99 File "<stdin>", line 1 a=1+b=99 ^^^ SyntaxError: cannot assign to expression
Later, while reading this interesting list of Python tricks, I discovered the walrus operator (:=).
a=1+(b:=99) a 100 b 99
So, we have a special type of assignment only for this particular case: the in-line use of the assigned variable. WTF? Why is there all this syntactic baggage to accomplish what simple left-to-right execution handles seamlessly? The prescribed use for this oddity seems to be to allow assignment within an if statement:
if first_prize := get_something(): ... # Do something with first_prize elif second_prize := get_something_else(): ... # Do something with second_prize
Many Different Ways to Add Together Two Sets of Numbers
If you know much about Python, you probably know that the language has made the unfortunate syntactical choice to overload the plus operator (+) to be concatenation for anything not scalar numbers. This makes it impossible to have the nice, simple consistent syntax of a language like J for performing arithmetic on arrays.
So we can do this:
2+3 5 (2)+(3) 5
This latter example is attempting to add together two single-element tuples but there is no such thing in Python; it interprets these as integers in spite of the the tuple syntax.
But if we try this, we get an error:
(2,10)+(3) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: can only concatenate tuple (not "int") to tuple
Lest we imagine the above error is hiding a length error, we see this is not the case:
(2,10)+(3,20) (2, 10, 3, 20)
The other major Python datatype, the list, does allow single item lists but does not let us add their items together.
[2]+[3] [2, 3]
The Inescapable NumPy Library
However, most Pythonistas know that if you actually want to use arrays, you have to load the numpy library. This lets us do this:
import numpy as np t0=np.reshape((2,3,5,7),(2,2)) type(t0) <class 'numpy.ndarray'> t0 array([[2, 3], [5, 7]]) t1=np.reshape((11,13,17,19),(2,2)) t0+t1 array([[13, 16], [22, 26]])
So, this actually works simply. However, as I discovered in my ML classes, for some arrays, it's not that simple. I was a bit astounded to discover that for some arrays, we have to use an Add() function in some cases. In the example above, we can simply add together objects of the type numpy.ndarray. The Add function also works for this type though it also promotes it to type Tensor.
Add()([t0,t1]) <tf.Tensor: shape=(2, 2), dtype=int32, numpy= array([[13, 16], [22, 26]])>
However, for the Keras Tensor type, we can only use Add(). To be fair, the tensor is a more complex object than a simple ndarray. For instance, the shape of a tensor will often include None, like this:
X.shape TensorShape([None, 15, 15, 256])
This is interpreted as an array where we do not know its entire shape.
Other Ways to Add
However, there are also other ways to add together arrays, such as by importing the operator library like this (compared to the numpy near-equivalent):
import operator v0=np.array([1,2,3]) v1=np.array([10,20,30]) v0+v1 array([11, 22, 33]) list(map(operator.add, v0, v1)) [11, 22, 33]
OK, so we have to know the name of the function, like add for +, but at least we can add together a couple of simple arrays resulting in a list, not another array. However, the consistency of this method for arrays of higher dimension leaves something to be desired. For instance, if we use our two 2x2 tables from above, t0 and t1:
list(map(operator.add, t0, t1)) [array([13, 16]), array([22, 26])]
We have to use list above to display our result because, if we don't, we get an unhelpful result like this:
map(operator.add, t0, t1) <map object at 0x0000022AB83DE4A0> np.shape(map(operator.add, t0, t1)) () type(map(operator.add, t0, t1)) <class 'map'>
In this case, instead of adding together the scalar elements of these two-dimensional arrays, operator.add adds together each corresponding row and returns a map; so, it accomplishes what we want - kind of - but puts us into another different data type.
StackOverflow to the Rescue(?)
Looking up how to add two lists in Python gives us these answers on StackOverflow.
l0=[1, 2, 3] l1=[10, 20, 30] [x + y for x, y in zip(l0, l1)] [11, 22, 33] [l0[i]+l1[i] for i in range(len(l0))] [11, 22, 33] list(map(lambda x,y: x+y, l0, l1)) [11, 22, 33]
If we are willing to import an external library, we can do it either of these ways:
import numpy as np np.add(l0, l1) array([11, 22, 33]) np.array(l0) + np.array(l1) array([11, 22, 33])
Of course, all of these require us to embed the names of our lists in each statement; none is a general, functional method for adding together lists, except for this one:
def addLists(*args): return list(map(sum, zip(*args))) addLists(l0, l1) [11, 22, 33] addLists(l0, l1, [700,800,900]) [711, 822, 933]
It looks like we have all these ways because of all the different but similar datatypes in which we can store arrays. Isn't it great to have so many ways to do this basic operation?
In the spirit of mockery, we will end with the clear and simple, not to mention almost-too-terse, object-oriented way to achieve this objective:
class SumList(object): def __init__(self, this_list): self.mylist = this_list def __add__(self, other): new_list = [] zipped_list = zip(self.mylist, other.mylist) for item in zipped_list: new_list.append(item[0] + item[1]) return SumList(new_list) def __repr__(self): return str(self.mylist) SumList(l0) + SumList(l1) [11, 22, 33] SumList(l0) + SumList(l1) + SumList([700,800,900]) [711, 822, 933]
Oh yeah, not awkward at all. Nevermind the new zip function we just introduced which is something like J's ,. (stitch) or ,: (laminate) though it's not clear which since Python has no clear default direction on array operations because, because, because, oh yeah, it does not know what an array is.
How to Add 2 Lists of Numbers in J
First of all, compare how we enter two lists of numbers, the older clumsy way,
l0=[1, 2, 3] l1=[10, 20, 30]
or the much simpler J way:
l0=. 1 2 3 l1=. 10 20 30
Finally,
l0+l1 11 22 33
Quite a bit simpler than every other way above.
Learning and Teaching J
We look at an alternative coding technique that begins with a rejection of object-oriented design and moves in a direction that might seem familiar to an array programmer.
Semantic Compression
Looking at this article about "Semantic Compression" by Casey Muratori, we find an argument against object-oriented programming as commonly practiced. Instead, the author offers more of a "code-first" practice of successive refinement which he claims is more productive. This method should be familiar to any experienced array-language programmer. In fact, the Beginner's Regatta section above illustrates this method.
The author starts off with an example of beginning the object-oriented design of a payroll system. However, it quickly becomes evident that the naive approach of building classes for different types of employees quickly breaks down under its own complexity.
First he looks at some relevant plural nouns - employees and managers - and decides "...the first thing you need to do is make classes for each of these nouns. There should be an employee class and a manager class, at least."
However, on reflection, both theses classes have the commonality that "both of those are just people. So we probably need a base class called “person”...."
Then there's the complication that a manager is also an employee, so "manager should probably inherit from employee, and then employee can inherit from person. Now we’re really getting somewhere! We haven’t actually thought about how to write any code, sure, but we’re modeling the objects that are involved, and once we have those solid, the code is just going to write itself."
The complexity continues to grow when we consider contractors as well. This muddies the inheritance hierarchy when we decide that the "...contractor class could inherit from the person class..." then ask the question "But then what does the manager class inherit from? If it inherits from the employee class, then we can’t have managers who work on contract. If it inherits from the contractor class, then we can’t have full-time managers. This is turning out to be a really hard programming problem...."
We could perhaps "...have manager inherit from both classes, and then just not use one of them. But that’s not type-safe enough..." so "...we templatize the manager class on its base class, and then everything that works with manager classes is templatized on that as well!"
He wraps this up by saying
It’d be great if everything I just wrote had been farcical, but sadly, there’s actually a lot of programmers in the world who think like this. I’m not talking about “Bob the Intern” — I’m talking about all kinds of programmers, including famous programmers who give lectures and write books. I am also sad to say that there was a time in my life when I thought this way, too. I was introduced to “object oriented programming” when I was 18, and it took me until I was about 24 to realize it was all a load of horseshit (and the realization was thanks in no small part to my taking a job with RAD Game Tools, which thankfully never bought into the whole OOP nightmare).