User:Devon McCormick/DLA01
Introduction to Phase Two of DLA Development
In this exercise, we use the lessons learned from the earlier DLA code (which is a re-vamp of an initial version), to explore how to better use the strengths of J to implement a version of this algorithm, considering more substantive issues of design and efficiency.
Lessons Learned
The original code gives us a framework from which to start but also embodies some techniques we may want to avoid. In general, the algorithm probably does not require us to release only a single point at a time, and it may not be necessary to simulate every step of the random walk. However, before we get to these more general considerations, let's consider some more immediate ones.
One striking feature of the original code is that it uses complex numbers to handle two-dimensional point pairs as scalars. This opens the possibility of extending this algorithm to higher dimensions if we change this way of handling two-dimensional arrays to be more general. Also, examination of the code reveals that there's a lot of back-and-forth between the complex and integer-pair representations, so this may save us some effort as well.
Another area where there seems to be a lot of potentially unnecessary work is the manipulation to ensure points are within the fixed area of the boolean matrix used to represent the cluster. The whole notion of a fixed matrix seems very un-J-like as well.
Timings
Timing the code does not reveal anything terribly surprising. To do this, first we load the J-profiler, along with our revised code:
load 'jpm' load '\amisc\J\nycjug\200910\cdla6Revised0.ijs'
We start profiling, then run a few thousand iterations of the DLA code to accumulate some timings.
start_jpm_ '' 357142 6!:2 'grow_dla_"0 ] 5e3$1e3' 412.3039
We look at the timing details:
showdetail_jpm_ 'grow_dla_' Time (seconds) +--------+--------+-----+----------------------------------------------------+ |all |here |rep |grow_dla_ | +--------+--------+-----+----------------------------------------------------+ |0.000047|0.000047| 13|monad | |0.000039|0.000039| 13|[0] c=.0 | |0.000297|0.000080| 13|[1] point=:SZ release CTR | |0.069787|0.069787| 13|[2] while. y>c do. | |0.203687|0.073027|13000|[3] point=:walk point | |0.539116|0.166117|13000|[4] if. (point outOfBounds CTR)+.check_oob point do.| |0.000795|0.000237| 38|[5] point=:SZ release CTR | |0.342444|0.121551| 38|[6] else. if. check_collision point do. | |0.000000|0.000000| 0|[7] add_to_cluster point | |0.000000|0.000000| 0|[8] SZ=:CTR dist point | |0.000000|0.000000| 0|[9] break. | |0.000000|0.000000| 0|[10] end. | |0.025543|0.025543|12962|[11] end. | |0.051510|0.051510|13000|[12] c=.>:c | |0.025514|0.025514|13000|[13] end. | |0.000104|0.000104| 13|[14] c | |1.258882|0.533556| 13|total monad | +--------+--------+-----+----------------------------------------------------+
We see that most of the time is spent on the lines checking if a point is in bounds or has collided with the cluster. However, "check_collision", on line 6, deserves further scrutiny as it accounts for a large part of the time (0.342%1.259 or 27%) based on relatively few (38) invocations.
showdetail_jpm_ 'check_collision_dla_' Time (seconds) +--------+--------+-----+--------------------------------------+ |all |here |rep |check_collision_dla_ | +--------+--------+-----+--------------------------------------+ |0.049207|0.049207|14739|monad | |0.201579|0.201579|14739|[0] 1 e.,(<([:<"1[:]_1 0 1+/~+.)y){SPC| |0.250786|0.250786|14739|total monad | +--------+--------+-----+--------------------------------------+
This is a drawback of single-line functions: we can't distinguish where the most time is being spent. However, since little code is easy to change, we can break this down into three lines to help figure this out.
cc0_dla_=: check_collision_dla_ f. NB. Save old version (fix with f.) check_collision_dla_=: 3 : 0 NB. New, multi-line version. y=. +.y ix=. <"1 y+/_1 0 1 1 e.,(<ix){SPC ) NB. Test with thousands of random points around the center of the space. $pts=. ~.j./450+?2 10000$100 6326 +/0 1=/~cc0_dla_"0 pts NB. Ensure we have a mixture of results 5192 1134 NB. Check that old and new version return the same results. (cc0_dla_"0-:check_collision_dla_"0) pts 1 NB. Time again: start_jpm_ '' 357142 6!:2 'grow_dla_"0 ] 1e3$1e3' 94.525435 showdetail_jpm_ 'check_collision_dla_' Time (seconds) +--------+--------+-----+---------------------+ |all |here |rep |check_collision_dla_ | +--------+--------+-----+---------------------+ |0.045990|0.045990|13694|monad | |0.044542|0.044542|13694|[0] y=.+.y | |0.081516|0.081516|13694|[1] ix=.<"1 y+/_1 0 1| |0.123070|0.123070|13694|[2] 1 e.,(<ix){SPC | |0.295119|0.295119|13694|total monad | +--------+--------+-----+---------------------+
Here we see that about half the time is spent computing the index and half the time using it to extract and check the points. We can play around with this a little more, to see this:
+--------+--------+-----+------------------------+ |all |here |rep |check_collision_dla_ | +--------+--------+-----+------------------------+ |0.046248|0.046248|13688|monad | |0.085743|0.085743|13688|[0] ix=.<"1(+.y)+/_1 0 1| |0.076942|0.076942|13688|[1] pts=.(<ix){SPC | |0.089570|0.089570|13688|[2] 1 e.,pts | |0.298504|0.298504|13688|total monad | +--------+--------+-----+------------------------+
which shows us two things: the simple re-assignment of "y" is relatively costly compared to using "+.y" in-place and the time to extract the points is slightly less than to check them though they are roughly equal.
It's amusing to play with minor variations to see the timing effects but it's an exercise that quickly passes the point of diminishing returns. One important point of all this is that much of the time is spent on the bounds checking: eliminating or changing this would provide a large efficiency increase.
A New Version
So, why not try storing the cluster as a D-column matrix and only translating to the boolean matrix form when we need to display it? We'll proceed with this as our working data structure because it has the advantages of flexibility and generality.
The other major change we'll make initially is to re-think how we release a new point. How about if we choose a release point arbitrarily close to the existing structure? Since a fair amount of our time (0.204%1.259 or about 16% in our first timing above) is spent randomly walking without encountering the cluster, it would make sense to limit this.
As one way to do this, our first cut at "release" will pick a point some arbitrary distance, in a random direction, away from an existing point and check if that's an empty neighborhood; if it is, make that the new release point; if not, try another random point from the set. If this works at all well, there are some simple improvements we could make as well.
So, our initial new code looks like this:
init=: 3 : 0 D=: y NB. How many dimensions? RNI=: D dimnbr 1 NB. Relative neighbor indexes PTS=: ,:y$0 NB. List of points PT=: ,0 NB. Point tracker: list of point indexes in order added. ) NB.EG init 2 NB. Initialize 2-dimensional space. NB.* aggStruc: make max rand walk y tries to aggregate point to structure, NB. (>0{x) away from random point in >1{x. aggStruc=: 4 : 0"(_ 0) point=. (>0{x) release >1{x [ ctr=. _1 while. y>ctr=. >:ctr do. point=. walk point if. PTS check_collision point do. y=. ctr [ addPt point end. end. ctr NB.EG (5;PTS) aggStruc 1e2$1e2 )
The "release policy" - exactly how we inject a new particle into the neighborhood of the cluster - is a crucial determinant of both the characteristics of the resulting cluster as well as how well the overall algorithm performs. The following code is explained in detail and developed more fully in a later essay.
NB.* release: release new particle x away to find open neighborhood. release=: 4 : 0 while. 1 e. PTS e. RNI+"1 newpt=. ((]{~[:?#)y)+x*_1 1{~?2 2 do. end. newpt ) NB.* check_collision: 1 iff x in neighborhood of points y. check_collision=: 4 : '1 e. x e. RNI+"1 y' NB.* dimnbr: x-dimensional y-nearest neighbor offsets. dimnbr=: 13 : '(x$0)-.~,/>{x$<y-~i.>:+:y' NB.* addPt: add point to cluster, track indexes of points added. addPt=: 3 : 'PT=: PT,<:#PTS=: PTS,y' NB.* walk: walk one random step. walk=: 3 : 'y+RNI{~?#RNI'
Let's try it out. We have to experiment with the arguments to get some that seem to work well initially.
tm=. 6!:2 'nn=: (5;PTS_dla_) aggStruc_dla_ 1e2$25' ]stats=: ,:tm,($PTS_dla_),(usus nn),nn+/ . = >./nn 0.043843002 18 2 3 25 22.77 5.6548093 84
Trying this a few more times and taking a few pictures along the way gives us these results.
2-~/\0,1{"1 stats NB. Number of points added each time 18 46 61 56 61 64 65 67 47 70 55 53 54 65 69 65 0.01 roundNums tms=. 0{"1 stats NB. Seconds per run 0.04 0.06 0.06 0.06 0.07 0.07 0.07 0.07 0.1 0.08 0.1 0.11 0.11 0.1 0.1 0.12 0.1 roundNums tms%~2-~/\0,1{"1 stats NB. Pts/sec 410.6 762.5 1055.2 865.2 884.3 982.2 910.2 911.1 491.8 847.9 544.7 499.4 481.9 624.4 697.4 535.2
This seems relatively efficient compared to our previous method - how do the results look? We display them this way:
viewmat mm=. 1 bordFill_dla_ PTS_dla_
where
NB.* bordFill: fill 0-mat w/1 according to y, having border of x cells. bordFill=: 4 : '(1)(<"1 y-"1 x-~<./y)}0$~2$(>:+:x)+(>./y)-<./y' NB.EG viewmat 1 bordFill PTS
Here's what we get for this batch:
If we use some different parameters, the growth starts more slowly but soon picks up.
tm=. 6!:2 'nn=: (10;PTS_dla_) aggStruc_dla_ 1e2$100' ]stats=: ,:tm,(#PTS_dla_),(usus nn),nn+/ . = >./nn 0.24648272 2 83 100 99.83 1.7 99 tm=. 6!:2 'nn=: (10;PTS_dla_) aggStruc_dla_ 1e2$100' _1{stats=: stats,tm,(#PTS_dla_),(usus nn),nn+/ . = >./nn 0.24002911 7 34 100 97.8 10.321107 95
Until we get to this:
2-~/\0,1{"1 stats NB. Number of points added each time. 2 5 5 19 34 43 58 58 71 57 53 56 62 59 63 50 0.01 roundNums tms=. 0{"1 stats NB. Number of seconds per run. 0.25 0.24 0.24 0.23 0.22 0.2 0.18 0.19 0.17 0.2 0.21 0.22 0.21 0.22 0.22 0.27 roundNums tms%~2-~/\0,1{"1 stats NB. Pts/sec 8 21 20 84 156 218 320 307 421 287 248 257 289 273 284 188
Similarly to the first set of parameters, the number of points/second increases to a certain maximum, then starts to decline. The reason for this is apparent: as we get more points in the cluster, we spend more time re-trying our guess at a release point as the points in the middle of the cluster are too crowded to give as a suitable, empty neighborhood.
Here's what the clusters generated from these parameters look like:
Where to Go From Here
This leaves us with some interesting tools with which to experiment. Fortunately, opting for flexibility and generality has also given us good performance while avoiding the "eternal lengthening" problem of the earlier code. However, we see that our implementation of "release" also gives fairly dense clusters as we will release points into internal regions if they are sufficiently open. Also, we can anticipate substantial performance penalties as we continue to enlarge the cluster.
This leads us to further consideration of alternate methods of releasing a new point. This is the subject of two somewhat redundant essays: one in which we start with the older version of the code which uses a fixed-size boolean matrix to represent the DLA, and another essay which summarizes and further develops the code explained above. The attached code here is the most current version which includes experiments and advances that have not been covered above.