NYCJUG/2009-09-08/ArrayHabitsHarmful
We pondered the following lesson from Joey Tuttle:
How Array Habits Can Be Harmful
. from Joey K Tuttle <jkt@qued.com> . to Programming forum <programming@jsoftware.com> . date Tue, Mar 10, 2009 at 2:34 AM . subject [Jprogramming] Habits can be harmful
I had a recent experience that others might find useful. I have some j code in a Linux #! script (jwork) that takes pairs of files in a sendmail queue (header and body files) and puts them together into a usable email object. Long years of habit had me starting with an empty result and doing something like -
result =: result, grind files
inside a for. loop -- and then, after the work was done -
stdout result
By moving the stdout bit inside the for. loop and removing the catenation in the loop, a partial result is catenated to the standard output pipe and the result file is built by adding the results of each iteration in the script. That is, something like
# ls q* | jwork > output
causes output to be built as jwork iterates through the files from the list.
This change caused a speedup of 4 or more times because of the simple elimination of copies of the catenated temporary result variable. This made things work a whole lot more reasonably and take a lot less memory - and has the interesting side effect of generating a perfectly usable partial result if interrupted during operation.
The point of this post is that sometimes what seems like "natural array thinking" is counter productive - maybe others know this instinctively, but it was an eye opener for me.
We discussed the trade-offs of an array-based approach - which often requires us to bring an entire object into memory before we can do anything - versus a streaming approach which brings into memory only a modest part of an object.
Sometimes in J we find that the piece of code we wrote which ran well on a small amount of test data has difficulty working with a realistic amount of actual data. In a case like this, we'll often be forced to wrap our nice, small, elegant piece of code in an unpretty loop in order to process a large amount of data in pieces. It would be nice if there were a general, transparent way to make this transition.