Beginner's regatta

We had an interesting discussion on multi-threading on the J forum this month. The discussion reminded us that J can do its own multithreading even on fairly basic operations simply by specifying a number of threads in the global environment by running 0&T.@''^:4'' prior to running a compute-intensive task like matrix multiplication.

Speeding Up Matrix Multiply

This email started a discussion on different ways to speed up a large matrix multiplication.

From: Thomas McGuire <tmcguirefl@gmail.com>
Date: Wed, May 29, 2024 at 2:16 AM
Subject: [Jforum] J matrix multiplication performance
To: <forum@jsoftware.com>

I had forked J9.6 and added the apple M1/M2 patch Elijah Stone had placed in the old forum. I did see about a 33% decrease in the time it took to do the following multiply:

   A =: 3000 4000?.@$0
   B =: 4000 2000?.@$0
   C =: A +/ . * B

I compiled with USE_OPENMP=1 and used brew to download libomp. I just thought I would see a more dramatic change. Anyone have any experience with this. This takes about 4.7 seconds when I time it.

Also on the old forum archive was a discussion about using threading to speed this up. But on my regular J9.6 beta 8 (downloaded from the install directory at the jsoftware site), I see no appreciable performance enhancement. I do see that it engages more than 1 core.

The threading discussion indicated that using:

   0&T.@''^:4 ''

should provide a noticable speed up in calculating. But it takes about 6.3 seconds on my M2 max MacBook threads or no threads.

Any insight would be appreciated.

---

From: Elijah Stone <elronnd@elronnd.net>
Date: Wed, May 29, 2024 at 2:29 AM

openmp should not affect anything in this case

33% was in the same ballpark as the speedup i saw from using the apple matrix multiply

---

From: Thomas McGuire <tmcguirefl@gmail.com>
Date: Wed, May 29, 2024 at 5:29 AM

A correction it is PyTorch that has incorporated access to the Apple silicon GPUs not Numpy.

Tom McGuire

---

From: Thomas McGuire <tmcguirefl@gmail.com>
Date: 5/29/24 00:56 (GMT-07:00)

My understanding of the Accelerate framework that Apple provides is that this operation is taking place on a matrix multiply coprocessor available to certain cores on the M2 chip. Not quite what I was hoping for.

In researching this it looks as though Numpy has taken some dedicated shaders Apple made for the Metal API and are able to force some matrix multiples onto the GPU. Do you know anything about doing that?

I’m not sure what I would do with it yet but it seems pretty cool that my high level J code could run on the GPU without me having to use separate API calls and go through the tedium of setting up a DLL interface.

Thanks for your late night reply, btw (at least in my timezone)

Tom McGuire

---

From: Ak <akinboo@gmail.com>
Date: Wed, May 29, 2024 at 10:00 AM

How many threads have you spun up? (1T.)
Are you using threadpools other than threadpool '0'?
How many cores are available? (8 T.)
What is the instruction are you using to run the command?

Ak

---

My Results

I ran the above exercise on my (non-Apple) machine which boasts an Intel(R) Core(TM) i9-10900F CPU running at 2.80GHz; it has 10 cores, which gives me 20 logical processors. Interestingly, J shows that I have 20 logical processors and up to 63 threads:

   8 T.''
20 63

Setting up the fairly large inputs and running the matrix multiply gives us a base timing for only a single thread.

   A =: 3000 4000?.@$0
   B =: 4000 2000?.@$0
   1 T. ''                    NB. How many worker threads do we have?
0      
   (10) 6!:2 'A +/ . * B'     NB. Average time over 10 iterations
1.34811

Now I start four threads and time it again:

   0&T.@''^:4 ''
   1 T. ''                  NB. Check how many threads we have.
4
   (10) 6!:2 'A +/ . * B'
0.303385

This improves performance by more than a factor of four, which is pretty impressive. I think this reflects running on five threads - the original base thread and four additional worker threads.

Adding more threads gives even faster execution but with diminishing speedups.

   0&T.@''^:8 ''            NB. Add 8 more threads
12
   1 T. ''                  NB. Giving a total of 12 threads
12
   (10) 6!:2 'A +/ . * B'   NB. Even faster
0.201836

This shows a speedup from the original by a factor of almost seven, so it's not proportionally as much of a speedup as we saw with the first four threads added.

Even More Speed

This discussion evolved into one where people were working with compiled modules but this is outside the scope of a beginner section so we will cover the rest of the discussion in the Advanced Topics section below.

Show-and-tell

We look at some preliminary work demonstrating J to an audience of hackers. This is part of a talk I plan to give at the Hackers On Planet Earth conference - HOPE XV - this July in New York.

Adding Text to an Image

Recently I found an image on the web I wanted to save but the explanation for the image was fairly involved so I wanted to save the text of the explanation with the image as a single file.

Let's say it was this image of a painting by Peter Bruegel the Elder:

There is some text associated with this image, like this:

   #text=. 0 : 0
The Peasant Wedding Banquet, 1568 by Peter Bruegel the Elder
The bride is sitting under her bridal crown; it is unclear which of the others is the bridegroom.  The feast is taking place in the barn, the wall behind the guests consisting of stacked-up straw or corn.  Two ears of corn with a rake call to mind the work that harvesting involves.  The plates are being carried around on a door taken off its hinges.  The principal form of nourishment in those days consisted of bread, porridge, and soup.
)
502

How do we keep the text together with the image? We could add an area below the picture and paste the text into there as an image but this looks clunky. As a test, I simply appended the text to the file like this:

   text fappend 'The Peasant Wedding Banquet.jpg'
502

As it turns out, this works just fine. When I double-click on the file name, it opens as an image but the text remains appended to it:

   text -: _502{.fread 'The Peasant Wedding Banquet.jpg'
1

Appending Images

Seeing how easily this worked, and how easy it is to do this in J, I experimented with doing other things with image files, like appending one image to another.

   (fread 'The Little Tower of Babel by Bruegel.jpg') fappend 'The Peasant Wedding Banquet.jpg'
351773

However, when I double-click on the resulting file, only the initial image shows up. The second image is in the file but hidden by the first one.

This got me thinking about steganography which is the hiding of data within a file.

Initial Attempts at Steganography

In a sense, my initial appending the text to the image file qualifies as steganography because, unless we know where to look, the text is hidden within that file. However, looking at the file in any sort of text editor, not as am image, quickly reveals the hidden text so it's not very well hidden.

Here we see a view of the image file within the emacs editor:

The "hidden" text is plainly visible at the end of the file.

This got me thinking about better ways to hide text in an image file. We could do something like encode the plain text before appending it to the file so it's not so evident but this complicates things because we have to know where the text is and how to decode it.

Some Embedding Experiments

It should be easy to hide a very small amount of text undetectably in a very large image, but it would be harder to hide a large amount of text in a smaller image. With this in mind, I started with a fairly small image as shown here:

This file is fairly small, about 180k for the 226 by 330 image.

   #fread 'GreatlyReducedGWoK.png'
180914
   $read_image 'GreatlyReducedGWoK.png'
226 330

For our text, let's use this version of the Gettysburg address:

text=. 0 : 0
Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.

Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met on a great battle-field of that war. We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this.

But, in a larger sense, we can not dedicate -- we can not consecrate -- we can not hallow -- this ground. The brave men, living and dead, who struggled here, have consecrated it, far above our poor power to add or detract. The world will little note, nor long remember what we say here, but it can never forget what they did here. It is for us the living, rather, to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us -- that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion -- that we here highly resolve that these dead shall not have died in vain -- that this nation, under God, shall have a new birth of freedom -- and that government of the people, by the people, for the people, shall not perish from the earth.

Abraham Lincoln
November 19, 1863
)
$text
1512

Another thing to consider is that jpeg files are lossy whereas other formats are not. If we hope to retrieve text we hide in a file, it may be safer to use a lossless format like .bmp or .png.

Here we look at the same file in three different formats to see that the lossless ones are the same but the lossy one differs.

   $gwokb=. read_image 'GreatlyReducedGWoK.bmp'
226 330
   $gwokj=. read_image 'GreatlyReducedGWoK.jpg'
226 330
   $gwokp=. read_image 'GreatlyReducedGWoK.png'
226 330
   gwokb -: gwokj    NB. BMP differs from JPG
0
   gwokb -: gwokp    NB. But the two lossless ones are the same
1

Put Text at Beginning

Let's try the very simplest thing first: simply append the text to the start of the image.

   gwokp1337=. ($gwokp)$(a. i. text),(#text)}.,gwokp   NB. Turn text into numbers, add to beginning of image.
   gwokp1337 write_image 'GreatlyReducedGWoK1337.png'
298320
   gwokp1337=. read_image 'GreatlyReducedGWoK1337.png'
   text -: a.{~(#text){.,gwokp1337                     NB. Ensure we can retrieve the text
1

The problem with this becomes evident if we compare the new image with the original because they look different:

We see evidence of the "hidden" text as the absence of the top of the picture.

Distributing Text Evenly

Let's instead evenly distribute the text throughout the image to see what that looks like.

   #,gwokp
74580

   74580%#text   NB. Distribute it evenly->how many pixels/text?
49.3254
   NB. So we could distribute it evenly, with rounding, every 49.3254 pixels.
   >./ixs=. <.0.5+](74580%#text)*i.#text   NB. Check highest index number
74531
   $ixs
1512
   gwokp13372=. ($gwokp)$(a. i. text) ixs},gwokp   NB. Insert text evenly
   gwokp13372 write_image 'gwokp13372.png'
298320

This is a little better but there is still a noticeable distortion of the image:

We see the text showing up as regularly-spaced white dots.

Distributing Text Randomly

Let's try to hide it better by putting the text in random locations. We will have to deal with setting the random seed to a known value to subsequently extract the text but let's first just see how this would look.

   $ixs=. (#text)?@$#,gwokp
1512
   usus ixs
38 74469 37925.2 21683.3
   gwokp13373=. ($gwokp)$(a. i. text) ixs},gwokp   NB. Insert text randomly
   gwokp13373 write_image 'gwokp13373.png'
298320

This is a little better but the hidden text still distorts the image:

Getting More Sophisticated: Hiding in 3D

One thing to consider is that a color image may also be seen as a three-dimensional array if we separate out the RGB planes. Fortunately, we have a utility to convert between the two and three-dimensional representations.

   rgb=: 4 : '(x$256)&#:`(256&#.)@.((x = [: {: $) *. 3 = [: # $) y'

This converts between the two differently dimensioned versions and is its own inverse.

   $3 rgb gwokp
226 330 3
   $3 rgb 3 rgb gwokp
226 330

This changes the shape of our array and essentially increases its number of elements, potentially giving us more hiding places. Let's go back to evenly distributing the text but in the 3D image.

   #,3 rgb gwokp
223740
   (#text)%~#,3 rgb gwokp
147.976
   ixs=. <.0.5+((#text)%~#,3 rgb gwokp)*i.#text
   gwokp13374=. ($3 rgb gwokp)$(a. i. text) ixs},3 rgb gwokp   NB. Insert text evenly
   gwokp13374 write_image 'gwokp13374.png'

This gives a slight improvement but is still noticeable in the image:

We still have regularly-spaced dots but at least they vary in color so are not quite as noticeable.

Future Directions

It seems we should develop more sophisticated hiding methods. One interesting possibility would be to combine more sophistication with the "text appended to the end of the file" method to append J code to retrieve arbitrarily complex embeddings of text in an image.

Advanced topics

We continue the discussion about multi-threading, started in the Beginner's Regatta above, and continue to a discussion of using compiled modules to achieve better performance.

Pooling Threads

We look at the improvement gained by pooling threads. Notice that the first input array is ten times the size used in the previous examples.

From: Ak <akinboo@gmail.com>
Date: Wed, May 29, 2024 at 3:53 PM

An idea.

   (0 T. ])"0 (2# 1 2 3 4)    NB. Spinup 8 threads in 4 pools.
   aa=: 30000 4000?.@$0       NB. Larger array
   bb=: 4000 2000?.@$0

   5 (6!:2,7!:2@])' cc=:      aa  +/ . * bb'
   5 (6!:2,7!:2@])' cc=:     ; aa t.''''  +/ . * bb'
   5 (6!:2,7!:2@])' cc=:     ; aa +/ . * t. (<''worker'';1) bb'

With a larger set I notice a reported (~5 % ) time improvement and a dramatic space improvement 5.5 e8 to 3kb for 5 runs each.

Ak

---

The 5% time improvement is nothing special - it's been remarked on the J forum how a speedup lower than, say, a factor of two is unlikely to be very significant. The space improvement does appear to be substantial but I could not replicate these results on my PC.

   aa=: 30000 4000?.@$0       NB. Larger array
   bb=: 4000 2000?.@$0
      
   1 T. ''
0
   10 (6!:2,7!:2@])'aa +/ . * bb'  NB. No worker threads
13.0457 5.36873e8
   (0 T. ])"0 (2# 1 2 3 4)    NB. Spinup 8 threads in 4 pools.
1 2 3 4 5 6 7 8
   10 (6!:2,7!:2@])'aa +/ . * bb'  NB. 8 worker threads in 4 pools
13.287 5.36873e8

Henry Rich responded to the pooling improvement (which I did not see) with this explanation:

From: Henry Rich <henryhrich@gmail.com>
Date: Wed, May 29, 2024 at 4:00 PM

Interesting. If you have N threads in threadpool 0, x +/ . * y automatically gives each thread (#x)%N rows of x to calculate the product of. This is pretty good until the number of rows gets 'too small', probably ~128 or so.

Henry Rich

---

Compiled Code Improvements

I did not try to replicate any of the following reported results as they are a bit involved.

Objective C on Apple Using ChatGPT

Thomas McGuire started us down this particular avenue for performance improvement.

From: Thomas McGuire <tmcguirefl@gmail.com>
Date: Fri, Jun 7, 2024 at 8:42 AM

Taking Raul’s statement to heart I did some more digging to see if I could use Apple’s Metal Performance Shader (MPS) API within J. As proof of concept I used the FFI available in the ‘dll’ library.

First not being an Apple developer the first problem with using Apple’s Metal API is that Apple likes to steer you toward Swift. There are some Metal APIs that can be used directly in C/C++ but not the MPS API. It is available to Objective-C, which it turns out if you write a typical C style function call it is accessible to J via the FFI facility in J.

There are some limitations. The MPS API only allows an 8000 x 8000 maximum for matrix multiplication. It also only uses 4 byte floats (aka float32). This is different from J which uses double as the underlying floating point representation. The Apple BLAS library also uses doubles. Which makes it easier to integrate right in the Source code as Elijah Stone had done.

I have a confession to make. Not being a swift or objective-c programmer I took an example swift program and took out the salient features I need and then asked ChatGPT to translate that into objective-c. It didn’t take much hand modification from there to get the MPS version of matrix product up and running and tested in a C calling program. I asked ChatGPT to make some J test code and write the FFI but it failed miserably and I had to do that myself.

If you have an M2 machine and are interested I made it installable into J via the ‘install’ command from GitHub.

install 'github:tmcguirefl/JmpsMP@main'
load 'tmcguirefl/mpsMP/mpsmp.ijs’

then run: testMP 0

Here are my results with impressive timing on the MPS multiply

Regular J matrix product

1018.25 1013.55 999.638 1000.76 1015.25 1032.3 1006.25 1022.34 1019.53 1014.35

Now with Apple Metal Performance Shader API

1018.25 1013.55 999.638 1000.76 1015.25 1032.3 1006.25 1022.33 1019.53 1014.35

Standard J MP : 6.08373 6.71108e7

MPS MP : 0.066262 5.45267e8

Yes that appears to be about 100x speed up. So I guess we know why GPU processing is favored for neural network / AI stuff.

All the code is available on Github so I won’t reproduce it here except for the links.

Another Version of Objective C on Apple

The compilation route was extended with this version from LdBeth.

From: LdBeth <andpuke@foxmail.com>
Date: Sat, Jun 8, 2024 at 12:17 PM

Actually, I think I have enough experience with ObjC to make this work.

I don't have M2 but only an Intel MacBook that runs macOS Monterey, but the Metal API is still available for the version of OS I installed. And I just managed to write an objc matrix multiplication example that works.

> It also only uses 4 byte floats (aka float32). Supported data types are listed at https://developer.apple.com/documentation/metalperformanceshaders/mpsdatatype?language=objc

Yes, there are complex16, complex32, float15, float32, uint32, uint64, but no float64.

ldbeth

More Details on Later Version

This was followed by some more detail, include some code listings.

From: LdBeth <andpuke@foxmail.com>
Date: Sat, Jun 8, 2024 at 9:16 PM

Since the linked repo does not contain the source code of the shared lib for some reason, I just rolled my own, and not a surprise the performance gain is noticeable on Intel Mac.

Here is the full code File:Libmps.ijs, the command to compile the shared library is

clang -framework Foundation -framework CoreGraphics -framework Metal -framework MetalPerformanceShaders -fobjc-arc -shared -o libmps.dylib libmps.m

---

The code above is a J wrapper around the Objective C code which should write out the file libmps.m when run. The compile code can be used by this J code File:Mps.ijs.

Further Discusssion on Using Compiled Code

From: Thomas McGuire <tmcguirefl@gmail.com>
Date: Sun, Jun 9, 2024 at 7:23 AM
Subject: Re: [Jforum] J matrix multiplication performance
To: <forum@jsoftware.com>

Yeah I had a brain freeze using my mail reader, ended up sending a good email to Raul personally instead of the forum. When I fixed that I forgot the links to the code. It took me a couple of days to figure that out and send the links in, sorry. But I like your objective-C better than mine.

So the next question is how to integrate it under the covers. For Elijah Stone's Apple libblas.dylib usage that can be added in at compile time by checking the architecture being built. This is nice in that the runtime only has the code for the particular architecture.

Adding in MPSMatrixMultiplication to the J source code and deciding when to use mps vs libblas for a particular matrix product. I was thinking of some combination of matrix size and print precision. - under a certain size requirement the interpreter would use the libblas or legacy approach, above that size use the mpsMatrixMult code. Then if you wanted to force a libblas (or legacy) approach, so you could gain precision over speed, the interpreter could look for a certain threshold in the print precision. With something like that I can program in J and let the interpreter decide how to carry out the calculation.

I have written some J code to overcome the 8000x8000 limitation mps has by breaking the multiplication into pieces from an algorithm I found here on Wikipedia (see the divide and conquer routine section): https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm:

My J implementation is attached File:RMP.ijs. It’s called rMP for recursive MatrixProduct. A 12000x14000 X 14000x13000 matrix product took 4.26 seconds with my version of the mpsMP. When I get a chance I will use it with LDBeth's MPSMatrixMul code.

Tom McGuire

---

Using BLAS without Compiled Code

Finally, Bill Lam shows us how to avoid the compilation by invoking BLAS directly from within J.

From: bill lam <bbill.lam@gmail.com>
Date: Sun, Jun 9, 2024 at 5:14 AM

For USE_OPENMP binaries, you can test with BLAS enabled. Save this to a script gm1.ijs:

   A=: 3000 4000?.@$0
   B=: 4000 2000?.@$0
   NP=: */ 3000 4000 2000

NB. using blas
   0 (9!:58)"0 i.3        NB.  +/ .*  always use blas
   t1=: 6!:2'c2=: A+/ .*B'
   echo ' sec',~ ":t1
   echo 'blas ',' GFlop ',~ 0j3": 2*NP%(t1)*1e9

Start jconsole with OMP environment variable.

bill:~% OMP_NUM_THREADS=1 jc ~/gm1.ijs
13.1143 sec
blas 3.660 GFlop
   ^D%                                                                                                                          
bill:~% OMP_NUM_THREADS=4 jc ~/gm1.ijs
3.96833 sec
blas 12.096 GFlop
   ^D%                                                                                                                          
bill:~% OMP_NUM_THREADS=8 jc ~/gm1.ijs
2.85146 sec
blas 16.833 GFlop

---

We see how increasing the number of threads improves performance. Unfortunately, we see that Bill is using Linux and I was not able to get this work under Windows.

Learning and Teaching J

We look at some thoughts on how quickly development should be able to be done and a study of the parts of the brain used when coding.

Development Velocity

In this blog article, the writer surveyed 100 software engineers from some of the most well-known tech companies like Meta, Amazon, and so on. The single question he posed to them was "What’s stopping you from shipping faster?"

He was surprised to find these two common responses:

1. Almost no-one had to think about it. People shot back with an immediate answer, as though I had just asked about a pain that was always on their mind, but never surfaced by another person.
2. There were a surprisingly high number of people for whom the reason had to do with build, compile, and deployment times. Devs really hate waiting for things.

Those of us in the interpreted languages community are probably not surprised by this since absence of compilation is a reason many of us prefer our dynamic environments.

Breaking down the responses and categorizing them brought up these common complaints.

Dependency bugs

Software is a delicate of house cards with open source libraries all the way down. In a lot of software, the source code comprises mostly of dependencies, and very often your most frustrating bug is no even your fault. Jack, ex-Microsoft and ex-Bridgewater lists his #1 barrier to shipping faster -
>Hitting random mysterious bugs with libraries that require reading tons of old stack overflow links and Github issues

Complicated codebase

This one is closest to my heart. In the last decade as dev speed has evolved into a necessary competitive advantage, nearly every software team suffers from code sprawl. The logic is sound:
• → Growing startup
• → must ship faster
• → no time to write docs
• → early engineers churn as company grows
• → new engineers need to navigate the mess.

Maria, who is an SDE at Amazon says this about her team’s codebase:

>There is so much undocumented in our service, including poor records of new features, nonexistent or outdated info on our dependencies, or even essential things like best practices for testing, a lot of time is wasted in syncs trying to find the right information
Interestingly, this problem only gets worse with time -
>Nobody has time to update the documentation, which creates a vicious cycle where more features are shipped, causing the docs to be more outdated and useless, meaning no one updates or uses them, and so on. A lot of our docs haven’t been updated in years

There were also common complaints about the delays introduced by waiting for specifications to be finalized, waiting for stakeholder approval, and writing tests.

Another set of complaints having to do with the development process may also resonate with those of us used to developing in an interpreted environment is this:

Deployment/build speed

This one was quite surprising for me because I did not realize how many people suffered from this. I have a whole theory around the toxicity of idle time - time that is not a break but not productive either. Waiting for deployments is exactly that.
Deploy times can often be into hours. Here’s what Aryan, a developer at Arista Networks had to say about what slows him down:
> So for me at work it takes like 3-4 hours for a full build to complete that I can run all the tests for a PR on, and those test can take anywhere from 1d to 3d to get done, even with a lot of optimization
So even a small change, like modifying a filename can get tedious

The article is worth reading in full because it brings to light some very common things that slow down development, not all of which we are saved from by using more dynamic environments.

Reading Code Differs from Reading Language

This article summarizes results of brain scans done on people who were working on code while being scanned. The conclusion is that "interpreting code activates a general-purpose brain network, but not language-processing centers."

This is interesting to J developers because of our fascination with language in general, especially as shown by the language-oriented vocabulary adopted by J where we talk about "nouns", "verbs", and "adverbs".

As the article states, this result is somewhat surprising because

In some ways, learning to program a computer is similar to learning a new language. It requires learning new symbols and terms, which must be organized correctly to instruct the computer what to do. The computer code must also be clear enough that other programmers can read and understand it.
In spite of those similarities, MIT neuroscientists have found that reading computer code does not activate the regions of the brain that are involved in language processing. Instead, it activates a distributed network called the multiple demand network, which is also recruited for complex cognitive tasks such as solving math problems or crossword puzzles.
However, although reading computer code activates the multiple demand network, it appears to rely more on different parts of the network than math or logic problems do, suggesting that coding does not precisely replicate the cognitive demands of mathematics either.
“Understanding computer code seems to be its own thing. It’s not the same as language, and it’s not the same as math and logic,” says Anna Ivanova, an MIT graduate student and the lead author of the study.

One possible limitation of the study was that it was done on relatively inexperienced programmers.

The researchers say that while they didn’t identify any regions that appear to be exclusively devoted to programming, such specialized brain activity might develop in people who have much more coding experience.
“It’s possible that if you take people who are professional programmers, who have spent 30 or 40 years coding in a particular language, you may start seeing some specialization, or some crystallization of parts of the multiple demand system,” Fedorenko says. “In people who are familiar with coding and can efficiently do these tasks, but have had relatively limited experience, it just doesn’t seem like you see any specialization yet.”

It would be interesting to repeat this study with people who are more experienced to see if the results differ.

NYCJUG/2024-06-11

Contents

Beginner's regatta

Speeding Up Matrix Multiply

My Results

Even More Speed

Show-and-tell

Adding Text to an Image

Appending Images

Initial Attempts at Steganography

Some Embedding Experiments

Put Text at Beginning

Distributing Text Evenly

Distributing Text Randomly

Getting More Sophisticated: Hiding in 3D

Future Directions

Advanced topics

Pooling Threads

Compiled Code Improvements

Objective C on Apple Using ChatGPT

Another Version of Objective C on Apple

More Details on Later Version

Further Discusssion on Using Compiled Code

Using BLAS without Compiled Code

Learning and Teaching J

Development Velocity

Dependency bugs

Complicated codebase

Deployment/build speed

Reading Code Differs from Reading Language

Navigation menu

Search