NYCJUG/2023-07-11

From J Wiki
Jump to navigation Jump to search

Beginner's regatta

Some time ago, I learned that the most frequently-used letters in the English language are "etaoinshrdlu", in that order. More recently, I decided to check this by analyzing some large bodies of text. Here we look at how to tabulate letter frequencies for various bodies of text.

Input Files

Here is a list of the files used in this study. They consist of various files from the internet, including lists of quotations and sayings.

  d:/amisc/txt/LargeTxt:
  total used in directory 45185 available 706.8 GiB
  drwxrwxrwx  1 devon devon    65536 07-03 00:15 ..
  drwxrwxrwx  1 devon devon    28672 07-03 00:16 .
  -rw-rw-rw-  1 devon devon   777176 06-28 23:44 AnObliqueApproach.txt
  -rw-rw-rw-  1 devon devon   212946 2006-10-05  Badge of Infamy by Lester del Ray.txt
  -rw-rw-rw-  1 devon devon   426056 06-28 23:44 BerserkerThrone.txt
  -rw-rw-rw-  1 devon devon     4573 2020-09-01  clipsToSinkLyinDonny.txt
  -rw-rw-rw-  1 devon devon   122049 06-29 02:43 CommonSense.txt
  -rw-rw-rw-  1 devon devon    23181 2014-12-24  DaveBarryNYC.txt
  -rw-rw-rw-  1 devon devon   218565 2007-12-28  Flatland by Abbot_flat10a.txt
  -rw-rw-rw-  1 devon devon   766690 2002-03-13  fortunes2.txt
  -rw-rw-rw-  1 devon devon   173831 2007-03-04  fortunesNew.txt
  -rw-rw-rw-  1 devon devon   343627 2007-12-28  Four-Day Planet by H. Beam Piper-19478-8.txt
  -rw-rw-rw-  1 devon devon  1053964 06-28 23:48 Freehold.txt
  -rw-rw-rw-  1 devon devon  1530492 2003-01-23  freethought.txt
  -rw-rw-rw-  1 devon devon    65336 2007-12-28  Gambler's World by John Keith Laumer_21627.txt
  -rw-rw-rw-  1 devon devon   133035 2007-12-28  Greylorn by John Keith Laumer_23028.txt
  -rw-rw-rw-  1 devon devon    18223 2015-09-25  herbsInfo.txt
  -rw-rw-rw-  1 devon devon    11543 2008-05-30  internetSlang.txt
  -rw-rw-rw-  1 devon devon     9031 2022-09-07  Jokes2.txt
  -rw-rw-rw-  1 devon devon    35822 2020-10-06  ListOfWhatHeDid.txt
  -rw-rw-rw-  1 devon devon    59082 2002-04-25  mathFortune.txt
  -rw-rw-rw-  1 devon devon    10871 2019-03-27  Names Notable of Obscure and Imaginary.txt
  -rw-rw-rw-  1 devon devon   118304 2008-01-04  Omnilingual by H Beam Piper_19445-8.txt
  -rw-rw-rw-  1 devon devon   827394 06-28 23:56 Original_Edition_of_edited_Schmitz_Stories.txt
  -rw-rw-rw-  1 devon devon   220921 2002-04-25  plan9fortunes.txt
  -rw-rw-rw-  1 devon devon   339819 2008-01-04  Planet of the Damned by Harry Harrison_21873-8.txt
  -rw-rw-rw-  1 devon devon   124502 2012-01-06  programmingQuotes.txt
  -rw-rw-rw-  1 devon devon   640987 06-28 23:58 Pyramid_Scheme.txt
  -rw-rw-rw-  1 devon devon   347739 2022-12-03  quotes.bbs
  -rw-rw-rw-  1 devon devon  1169116 06-28 23:59 Retief.txt
  -rw-rw-rw-  1 devon devon   398836 2008-01-04  The Devolutionist and the Emancipatrix by Homer Eon Flint_thdvl10.txt
  -rw-rw-rw-  1 devon devon   582469 06-28 23:54 TheGrantvilleGazetteVol1.txt
  -rw-rw-rw-  1 devon devon   153471 06-29 02:46 The Mountains of Mourning.txt
  -rw-rw-rw-  1 devon devon    44363 2007-12-28  The Stoker and the Stars by Algirdas Jonas Budrys_22967.txt
  -rw-rw-rw-  1 devon devon   204350 2007-12-28  The Ultimate Weapon by John Wood Campbell_23790-8.txt
  -rw-rw-rw-  1 devon devon    10040 2009-02-13  theWayOfTheKook.txt
  -rw-rw-rw-  1 devon devon    52970 2007-12-28  The Yillian Way by John Keith Laumer_21782.txt
  -rw-rw-rw-  1 devon devon    50401 2008-01-04  Traders Risk by (pseud) Roger Dee_23103.txt
  -rw-rw-rw-  1 devon devon   344327 2007-12-28  Uller Uprising by H. Beam Piper_19474-8.txt
  -rw-rw-rw-  1 devon devon   525368 2018-03-15  wlist_match10.txt
  -rw-rw-rw-  1 devon devon   375030 2018-03-15  wlist_match11.txt
  -rw-rw-rw-  1 devon devon   231516 2018-03-15  wlist_match12.txt
  -rw-rw-rw-  1 devon devon 14278764 2018-03-15  wlist_match1.txt
  -rw-rw-rw-  1 devon devon  5163818 2018-03-15  wlist_match2.txt
  -rw-rw-rw-  1 devon devon  3319961 2018-03-15  wlist_match3.txt
  -rw-rw-rw-  1 devon devon  2479169 2018-03-15  wlist_match4.txt
  -rw-rw-rw-  1 devon devon  1950487 2018-03-15  wlist_match5.txt
  -rw-rw-rw-  1 devon devon  1548879 2018-03-15  wlist_match6.txt
  -rw-rw-rw-  1 devon devon  1230525 2018-03-15  wlist_match7.txt
  -rw-rw-rw-  1 devon devon   956272 2018-03-15  wlist_match8.txt
  -rw-rw-rw-  1 devon devon   733180 2018-03-15  wlist_match9.txt
  -rw-rw-rw-  1 devon devon  1749989 2002-02-04  WORD.LST
  -rw-rw-rw-  1 devon devon     5966 06-29 23:59 WorksAndDays.txt

Most of the last group of files, from "wlist_match10.txt" to "WORD.LST", are comprehensive lists of words in alphabetical order. All the other files are texts of various novels from the Gutenberg Project or various list of jokes and sayings.

Reading Data

Here we assign a variable dd to be the directory prefix of all the files underneath it, assuming they are all word files we expect. We use the standard library function tolower to ensure there are only lower-case versions of each letter.

   dd=. 'D:\amisc\txt\LargeTxt\'
   #txt=. tolower ;fread&.>(<dd),&.>0{"1 dir dd,'*.*'
46175026

So txt is a vector of more than 46 million lower-case letters from all these files. These letters correspond to the first 26 items in the standard J global variable Alpha_j_.

Build Frequency Table

First, set our alphabet.

   ]letts=. 26}.Alpha_j_
abcdefghijklmnopqrstuvwxyz

Now build the frequency table.

   frq=. <:+/"1 letts ([ =/ ,) txt

Breaking down this phrase, we see that the rightmost expression gives us a large table of Boolean values based on comparing the alphabet to the alphabet concatenated to the text. We precede the text by the alphabet to ensure we get results in the order corresponding to the alphabet, i.e. the first row corresponds to "a", the next to "b", and so on.

   $letts ([ =/ ,) txt
26 46175052
   10 40{.letts ([ =/ ,) txt
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Notice the diagonal of ones at the start of this table. This reflects the prepending of the alphabet.

We see that the shape of the table is #letts by #txt, so it's rather large. Summing each row gives us the number of occurrences of each letter but we have to subtract one to account for the extra letters of the prepended alphabet.

Efficiency

Note that this method does not scale well as it produces a large intermediate result before the summation. How long does this take?

   (10) 6!:2 '<:+/"1 letts ([ =/ ,) txt'
0.337766
   10{.<:+/"1 letts ([ =/ ,) txt
3522382 790968 1438743 1397585 4278072 595216 991186 1370755 3086360 123338

However, we know a more efficient way to calculate these frequencies using J's key operator /. as seen here.

   (10) 6!:2 '<:#/.~ letts,txt'
0.0479608
   10{. <:#/.~ letts,txt
3522382 790968 1438743 1397585 4278072 595216 991186 1370755 3086360 123338

This is much faster and should scale better for a larger base of text. As with the other method, we prepend the alphabet to ensure the order of the results correspond to it and subtract one from the result to compensate for the extra set of prepended letters.

Showing Results

Once we have calculated the frequencies of the letters,

   frq=. <:#/.~ letts,txt

we want to display a table of results, something like this:

   $mat=. (<"0]letts),.<"0 frq
|length error, executing dyad ,.
|shapes 26 and 153 have different numbers of items
|   $mat=.(<"0]letts)    ,.<"0 frq

What's this problem? We see that we have a lot more than 26 frequencies.

   $frq
153

Looking past the first 26, we see many non-alphabetic characters.

   26}.~. letts,txt
 
.,©1984037:-65&2—;*'()"!?[]/ï#_$ %@   `=~|<>\+^	�{�     ᧴└ 挎           ���├┬┐┌         Ͻ                     ɬ  

These various characters reflect numerals, punctuation, symbols, and the residue of special characters, perhaps reflecting a sprinkling of Unicode characters. Let's restrict ourselves to only the first 26; we use the tally of letts to allow us to generalize this in the future if we want to count the non-alphabetic characters.

   frq=. (#letts){.<:#/.~ letts,txt
   $mat=. (<"0]letts),.<"0 frq
26 2
   3{.mat
+-+-------+
|a|3522382|
+-+-------+
|b|790968 |
+-+-------+
|c|1438743|
+-+-------+
   +/frq
39231892
   (#txt)-+/frq
6943134 

So, we have a two-column table mat of letters and their respective frequencies. Let's format this for display. First, insert commas into the long numbers for ease of reading. Fortunately, we have a personal utility to do this.

   '*omma*' names 3
commaFmtNum 
   commaFmtNum 
3 : 0
   0.01 commaFmtNum y
:
   }.;(' ',' '-.~])&.>('c0.',":>.10^.%x) 8!:0 y
NB.EG    commaFmtNum 1234 123456.78 12.1 111222333
NB. 1,234.00 123,456.78 12.10 111,222,330.00
NB.EG    0 commaFmtNum 1234 123456.78 12.1 111222333
NB. 1,234 123,457 12 111,222,330
)
   1 commaFmtNum 5{.frq
3,522,382 790,968 1,438,743 1,397,585 4,278,072
   $mat=. (<"0]letts),.1 commaFmtNum &.><"0 frq
26 2
   3{.mat
+-+---------+
|a|3,522,382|
+-+---------+
|b|790,968  |
+-+---------+
|c|1,438,743|
+-+---------+

Now we need to fix the alignment of the numbers to ensure all are right-aligned.

   rightAlign=: 13 : '(->./#&>y){.&.>y'     NB. Negative overtake by longest number
   $mat=. (<"0]letts),.rightAlign 1 commaFmtNum &.><"0 frq
26 2
   3{.mat
+-+---------+
|a|3,522,382|
+-+---------+
|b|  790,968|
+-+---------+
|c|1,438,743|
+-+---------+

Final Result

Rather than display this table in alphabetical order, it's more useful to order by the letter frequency, hence the \: frq below.

Here's what we found:


   (;"1 ' ',&.>":&.>mat) \: frq
 e 4,278,072
 a 3,522,382
 i 3,086,360
 s 2,913,377
 n 2,738,458
 o 2,709,298
 r 2,676,211
 t 2,654,758
 l 2,003,495
 c 1,438,743
 d 1,397,585
 h 1,370,755
 u 1,269,965
 m 1,174,851
 g   991,186
 p   961,640
 b   790,968
 y   702,045
 f   595,216
 k   559,218
 w   489,051
 v   417,154
 z   185,436
 j   123,338
 x   115,249
 q    67,081

Looking at these numbers, it seems there is a relatively big drop off between m and g, so, comparing the more popular letters from this sample against the 'etaionshrdlu' from long ago, we find they differ primarily by the letter c moving up in the rankings.

   #'etaoinshrdlu'
12
   #'eaisnortlcdhu'
13 
   'eaisnortlcdhu'-.'etaoinshrdlu'
c

Also, t has moved down in the ranking and o and i have swapped popularity; this sample also appears to be heavy on vowels.

Show-and-tell

We look at some J equivalents of some of the common Python tensorflow libraries which are used to construct neural networks.

Edge Detection

One of the techniques for image processing used by neural networks is edge detection.

Here is a description of detecting vertical edges using a reducing filter (window):

Edge detection by convolution2.JPG

The leftmost table represents pixel values; the middle 3x3 table is a filter that accentuates vertical lines. The result of this convolution is shown on the right.

This filter is essentially looking at changes in the second derivative horizontally.

Below, we replicate this specific example in J.

   ]m0=. 3 0 1 2 7 4,1 5 8 9 3 1,2 7 2 5 1 3,0 1 3 1 7 8,4 2 1 6 2 8,:2 4 5 2 3 9
3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3
0 1 3 1 7 8
4 2 1 6 2 8
2 4 5 2 3 9
   ]cmED=. |:_3]\3#1 0 _1   NB. convolution matrix for (vertical) Edge Detection
1 0 _1
1 0 _1
1 0 _1
   convolute=: 13 : '+/,x * y'
   (1 1,:3 3)(cmED&convolute;._3) m0
 _5 _4  0   8
_10 _2  2   3
  0 _2 _4  _7
 _3 _2 _3 _16

equivs=. 0 : 0
Python: conv_forward
tensorflow: tf.nn.conv2d
keras: Conv2D
)

Advanced topics

We take a brief look at the new sorting method discovered by AI and explore the wonderful array of options Python has for the simple problem of adding together numeric arrays.

AI Sorting

Recently there was news about a new faster sorting algorithm "discovered" by AI but there is some reason to think this may be a bit over-hyped.

According to this article, researchers using Google's DeepMind AI have discovered a new sorting algorithm that is up to 70% faster for small sequences - say five numbers - and 1.7% faster for large sequences, say 250,000 numbers. The researchers claim that this “will transform the foundations of computing.”

Caveats

First of all, this is not an "algorithm" in the traditional sense: DeepMind was given a set of assembly language instructions which it combined in novel ways to come up with a set of instructions to sort more quickly; it apparently takes advantage of a CPU's internal branch prediction shortcuts. So, there is apparently no sequence of steps one could express in a high-level language or in pseudo-code to implement this new "algorithm" in an arbitrary language.

The result of this exercise is a set of assembly language instructions which have been incorporated into Google's Abseil library of C++ functions, available here.

However, there is at least one dissenting view from our old friend Arthur Whitney.

​From: Arthur Whitney <a@shakti.com>
Date: Sun, Jul 2, 2023 at 3:56 PM
Subject: [shakti] AI - we may be safe for a few more days
To: k <k@k.topicbox.com>, shaktidb <shaktidb@googlegroups.com>

https://thenewstack.io/googles-deepmind-extends-ai-with-faster-sort-algorithms/

>Google DeepMind searched for a faster sorting algorithm using an >AI system and [claim] "will transform the foundations of computing." really? (see below)

> Google emphasized that sorting algorithms affect billions of people every day yes

> further improvements on the efficiency of these routines has proved challenging apparently

> sequences with over 250,000 elements, the results were still 1.7% faster mazeltov?

>And this isnt just an abstract exercise. Google has already made the code open source, >uploading it into LLVMs main library for standard C++ functions mazeltov?

they added 10,000,000 bytes of code to the garbage pile (llvm12-13) -rw-r--r-- 1 root root 47554640 Feb 4 2022 llvm-12/lib/libclang-cpp.so.12 -rw-r--r-- 1 root root 57844128 Mar 15 2022 llvm-13/lib/libclang-cpp.so.13

>Google proudly points out that millions of developers and companies around the world now use it on AI applications morons

nanoseconds per element to sort 250000 random uint32 (skylake/zen2/m1)

numpy.sort 62 libc++-12 60.7 libc++-13 60.3

and the A-team k 2

i.e. x:250000__1e9 \t:4 ^x p.s. this is why i never use libraries

How Fast is J's Sort?

Using Arthur's "per item" measure above, here's what I see for J on my machine (Intel i9-10900F CPU @ 2.80GHz):

   #rand=. 250000?@$<:2^31
250000
   3!:0]rand              NB. Ensure we have integers
4
   (100) 6!:2 '/:~rand'   NB. Run 100 times for good measure
0.00439556
   0.00439556%25e4  NB. s/element
1.75822e_8
   NB. About 18 nanoseconds/element

Of course this is not really comparable to Arthur's timings because he is running on a different machine but J still looks pretty good running on a two-year old consumer grade Windows PC.

Also, I could not come up with comparable timings using the publicly available C++ libraries because I could not put in the time to make that stuff work. It's just too hard, especially compared to something like the above.

An Edge Condition

During our meeting discussion about the pitfalls of sorting, John brought up the fact that Java implements sorting using HeapSort which, while not necessarily the fastest, has good behavior in the sense that it takes a similar amount of time for any sort. This contrasts with QuickSort, often thought to be one of the fastest algorithms but has worst-case behavior that's much worse than average. In fact, QuickSort does very poorly on data that is almost sorted.

With this in mind, I thought to test J's sort performance on a vector that is almost sorted.

   #rand=. /:~250000?@$<:2^31
250000

So, rand is completely sorted but let's scramble 100 items to be out of order:

   ixs=. 100?#rand
   rand=. (|.ixs{rand) ixs}rand  NB. almost sorted

Now rand is sorted except for 100 elements that are out of order. How well does J perform on this?

   (100) 6!:2 '/:~rand'
0.00240286
   0.00240286%25e4
9.61144e_9

It actually runs nearly twice as fast as our earlier test with completely randomly-distributed numbers, taking about 10 nanoseconds per item.

Silly Python

I have been taking some machine-learning courses which use Python libraries like keras to implement neural networks. Along the way, I've learned a few of the more obscure things about Python. Many of these features seem odd and unnecessarily complex when we are used to the simplicity of J's arrays and syntax, but we assume they have their own logics which will become more clear as we get more acquainted with them.

Using an Assigned Variable on the Same Line

When I was doing one of the assignments, I had to create two variables, one of which was simply a constant added to the other variable, something we might write this way in J:

   a=. 1+b=.99

Trying to do a similar thing in Python give us this:

    a=1+b=99
  File "<stdin>", line 1
    a=1+b=99
      ^^^
SyntaxError: cannot assign to expression

Later, while reading this interesting list of Python tricks, I discovered the walrus operator (:=).

    a=1+(b:=99)
    a
100
    b
99  

So, we have a special type of assignment only for this particular case: the in-line use of the assigned variable. WTF? Why is there all this syntactic baggage to accomplish what simple left-to-right execution handles seamlessly? The prescribed use for this oddity seems to be to allow assignment within an if statement:

if first_prize := get_something():
    ...  # Do something with first_prize
elif second_prize := get_something_else():
    ...  # Do something with second_prize

Many Different Ways to Add Together Two Sets of Numbers

If you know much about Python, you probably know that the language has made the unfortunate syntactical choice to overload the plus operator (+) to be concatenation for anything not scalar numbers. This makes it impossible to have the nice, simple consistent syntax of a language like J for performing arithmetic on arrays.

So we can do this:

    2+3
5
    (2)+(3)
5

This latter example is attempting to add together two single-element tuples but there is no such thing in Python; it interprets these as integers in spite of the the tuple syntax.

But if we try this, we get an error:

    (2,10)+(3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can only concatenate tuple (not "int") to tuple

Lest we imagine the above error is hiding a length error, we see this is not the case:

    (2,10)+(3,20)
(2, 10, 3, 20)

The other major Python datatype, the list, does allow single item lists but does not let us add their items together.

    [2]+[3]
[2, 3]  

The Inescapable NumPy Library

However, most Pythonistas know that if you actually want to use arrays, you have to load the numpy library. This lets us do this:

    import numpy as np

    t0=np.reshape((2,3,5,7),(2,2))
    type(t0)
<class 'numpy.ndarray'>
    t0
array([[2, 3],
       [5, 7]])
    t1=np.reshape((11,13,17,19),(2,2))
    t0+t1
array([[13, 16],
       [22, 26]])

So, this actually works simply. However, as I discovered in my ML classes, for some arrays, it's not that simple. I was a bit astounded to discover that for some arrays, we have to use an Add() function in some cases. In the example above, we can simply add together objects of the type numpy.ndarray. The Add function also works for this type though it also promotes it to type Tensor.

    Add()([t0,t1])
<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[13, 16],
       [22, 26]])>

However, for the Keras Tensor type, we can only use Add(). To be fair, the tensor is a more complex object than a simple ndarray. For instance, the shape of a tensor will often include None, like this:

    X.shape
TensorShape([None, 15, 15, 256])

This is interpreted as an array where we do not know its entire shape.

Other Ways to Add

However, there are also other ways to add together arrays, such as by importing the operator library like this (compared to the numpy near-equivalent):

    import operator
    v0=np.array([1,2,3])
    v1=np.array([10,20,30])
    v0+v1
array([11, 22, 33])
    list(map(operator.add, v0, v1))
[11, 22, 33]

OK, so we have to know the name of the function, like add for +, but at least we can add together a couple of simple arrays resulting in a list, not another array. However, the consistency of this method for arrays of higher dimension leaves something to be desired. For instance, if we use our two 2x2 tables from above, t0 and t1:

    list(map(operator.add, t0, t1))
[array([13, 16]), array([22, 26])]

We have to use list above to display our result because, if we don't, we get an unhelpful result like this:

    map(operator.add, t0, t1)
<map object at 0x0000022AB83DE4A0>

    np.shape(map(operator.add, t0, t1))
()
    
    type(map(operator.add, t0, t1))
<class 'map'>

In this case, instead of adding together the scalar elements of these two-dimensional arrays, operator.add adds together each corresponding row and returns a map; so, it accomplishes what we want - kind of - but puts us into another different data type.

StackOverflow to the Rescue(?)

Looking up how to add two lists in Python gives us these answers on StackOverflow.

    l0=[1, 2, 3]
    l1=[10, 20, 30]
    [x + y for x, y in zip(l0, l1)]
[11, 22, 33]
    [l0[i]+l1[i] for i in range(len(l0))]
[11, 22, 33]
    list(map(lambda x,y: x+y, l0, l1))
[11, 22, 33]

If we are willing to import an external library, we can do it either of these ways:

    import numpy as np

    np.add(l0, l1)
array([11, 22, 33])

    np.array(l0) + np.array(l1)
array([11, 22, 33])

Of course, all of these require us to embed the names of our lists in each statement; none is a general, functional method for adding together lists, except for this one:

    def addLists(*args): return list(map(sum, zip(*args)))

    addLists(l0, l1)
[11, 22, 33]  
    addLists(l0, l1, [700,800,900])
[711, 822, 933]

It looks like we have all these ways because of all the different but similar datatypes in which we can store arrays. Isn't it great to have so many ways to do this basic operation?

In the spirit of mockery, we will end with the clear and simple, not to mention almost-too-terse, object-oriented way to achieve this objective:

class SumList(object):
    def __init__(self, this_list):
        self.mylist = this_list

    def __add__(self, other):
        new_list = []
        zipped_list = zip(self.mylist, other.mylist)
        for item in zipped_list:
            new_list.append(item[0] + item[1])
        return SumList(new_list)

    def __repr__(self):
        return str(self.mylist)

    SumList(l0) + SumList(l1)
[11, 22, 33]

    SumList(l0) + SumList(l1) + SumList([700,800,900])
[711, 822, 933]

Oh yeah, not awkward at all. Nevermind the new zip function we just introduced which is something like J's ,. (stitch) or ,: (laminate) though it's not clear which since Python has no clear default direction on array operations because, because, because, oh yeah, it does not know what an array is.

How to Add 2 Lists of Numbers in J

First of all, compare how we enter two lists of numbers, the older clumsy way,

l0=[1, 2, 3]
l1=[10, 20, 30]

or the much simpler J way:

l0=. 1 2 3
l1=. 10 20 30

Finally,

   l0+l1
11 22 33

Quite a bit simpler than every other way above.

Learning and Teaching J

We look at an alternative coding technique that begins with a rejection of object-oriented design and moves in a direction that might seem familiar to an array programmer.

Semantic Compression

Looking at this article about "Semantic Compression" by Casey Muratori, we find an argument against object-oriented programming as commonly practiced. Instead, the author offers more of a "code-first" practice of successive refinement which he claims is more productive. This method should be familiar to any experienced array-language programmer. In fact, the Beginner's Regatta section above illustrates this method.

The author starts off with an example of beginning the object-oriented design of a payroll system. However, it quickly becomes evident that the naive approach of building classes for different types of employees quickly breaks down under its own complexity.

First he looks at some relevant plural nouns - employees and managers - and decides "...the first thing you need to do is make classes for each of these nouns. There should be an employee class and a manager class, at least."

However, on reflection, both theses classes have the commonality that "both of those are just people. So we probably need a base class called “person”...."

Then there's the complication that a manager is also an employee, so "manager should probably inherit from employee, and then employee can inherit from person. Now we’re really getting somewhere! We haven’t actually thought about how to write any code, sure, but we’re modeling the objects that are involved, and once we have those solid, the code is just going to write itself."

The complexity continues to grow when we consider contractors as well. This muddies the inheritance hierarchy when we decide that the "...contractor class could inherit from the person class..." then ask the question "But then what does the manager class inherit from? If it inherits from the employee class, then we can’t have managers who work on contract. If it inherits from the contractor class, then we can’t have full-time managers. This is turning out to be a really hard programming problem...."

We could perhaps "...have manager inherit from both classes, and then just not use one of them. But that’s not type-safe enough..." so "...we templatize the manager class on its base class, and then everything that works with manager classes is templatized on that as well!"

He wraps this up by saying

It’d be great if everything I just wrote had been farcical, but sadly, there’s actually a lot of programmers in the world who think like this. I’m not talking about “Bob the Intern” — I’m talking about all kinds of programmers, including famous programmers who give lectures and write books. I am also sad to say that there was a time in my life when I thought this way, too. I was introduced to “object oriented programming” when I was 18, and it took me until I was about 24 to realize it was all a load of horseshit (and the realization was thanks in no small part to my taking a job with RAD Game Tools, which thankfully never bought into the whole OOP nightmare).