User:Devon McCormick/Interactive3DScatterplots
Extending Diffusion-Limited Aggregation to Multiple Dimensions
Initializing a Point-set
The initialization code is a logical place to start, after we load the script and put its namespace in our environment:
load '~Code/DiffLimAgg.ijs' coinsert 'dla' NB. Path in the namespace for easier typing...
The code is defined in the namespace “dla”, so we insert this into our class object path after loading the script so we don’t have to suffix all the names with “_dla_” in the following exposition. This means we can simply enter the name “init” of the initialization function instead of “init_dla_”, as shown here, to see its definition.
init=: 3 : 0 D=: y NB. D-dimensions RNI=: D dimnbr 1 NB. Relative neighbor indexes: 1 cell around center. neigh2=: [ -.~ [: ~. [: ,/ RNI +"1/~ ] NB. Empty neighbors of these PTS=: ,:y$0 NB. Start point list w/Origin. NB.EG init 2 NB. Initialize 2-dimensional space. )
We see that running init with an argument of “2” initializes a two-dimensional list of points, starting at the origin, in the global PTS.
init 2 0 0 $PTS 1 2
Next we’ll look at the cover function created during the development of the code to aggregate points to the list according to a random walk. This cover function returns a lot of timing information as this is an important consideration in the development of this code. The original version of this process used a fixed-size Boolean matrix to represent the aggregation. This had the advantage of simplicity and is easy to handle in non-J languages as well as in J.
However, this data structure introduces hard limits at the outset to how large our result can be. On the other hand, a fixed matrix helps the run-time efficiency as it also limits how far from the central aggregate we can release a random point and the distance from a possible “sticking” point is the primary determinant of how long it will take a particle to add the structure.
By changing the data structure to represent the points explicitly, we allow more than two dimensions but we also open the release space to be infinitely large. So, we need a way to limit the release space to be near the existing cluster. We’ve worked out a way to do this by building a perimeter around the existing cluster which we’ll illustrate shortly. First, we’ll continue the logical progression of our code exposition by looking at the next verb we’d run in the sequence.
do1=: 3 : 'tm,(#PTS),(sv-~#PTS),(tm%~sv-~#PTS),(usus nn),(#nn)%~nn+/ . =>./nn [ tm=. 6!:2 ''nn=. growMany y''[sv=. #PTS'
This rather long line buries the core work done by the growMany verb in a mess of statistics about the run-time performance. It’s easier to show before we explain, so we’ll do so now by giving an example of running this cover function.
do1 25;20;5 2 0.0298599 5 4 133.959 7 25 22.25 6.02517 0.8
The arguments to do1 are number of steps (25, here) per random walk (before abandoning an attempt), number of points (20) to release, then a pair of numbers denoting how far out from the cluster to we draw the release perimeter (5 cells) and how wide to make this perimeter (2 cells). Note that the number of steps should be greater than or equal to the perimeter distance squared (5^2); “squared” because it’s a two-dimensional space and raising to this power reflects how many steps a random walk should take, on average, to travel that distance. For the higher-dimensional case, we would limit the step count by (5^D) where "D" is the dimensionality.
The results of this run are: number of seconds to complete (about 0.03), total number of resulting points (5), number of points added this time (4), number of points added per second (about 134); the five numbers following these first four give statistics on the number of random steps: the minimum (7), maximum (25), mean (22.25), standard deviation (6), and portion of walks reaching the maximum (80%).
Running this again with the same arguments results in about the same efficiency.
do1 25;20;5 2 0.00197011 10 5 2537.93 5 25 22.55 5.2463 0.75
The time spent is much shorter but the little importance should be attached to differences between such small numbers. The more important numbers in these initial accretions are the statistics of the number of random steps which are essentially the same as for the earlier invocation. Before we look at the resulting set of points, we’ll run this one more time, doubling the number of points we try (to 40) for a slight increase in efficiency.
do1 25;40;5 2 0.00315529 22 12 3803.14 2 25 20.975 7.48498 0.7
We can use the standard viewmat verb to visualize the set of points – shown here transposed for efficiency of display - we have so far:
|:PTS 0 _1 _1 _2 _3 _2 0 _4 _5 _5 _6 _1 _6 1 0 _6 1 _7 0 1 _8 0 0 _1 _2 _2 _1 _3 _3 _2 _1 _3 0 1 _1 0 2 _4 3 _2 4 4 _2 _4 viewmat 1 bordFill PTS
Setting the “Release” Perimeter
This small cluster of points we’ve generated will better help us illustrate how we determine the cluster perimeter into which we release potential new points on their random walks. As mentioned before, the do1 verb is an informative wrapper for the core growMany verb, shown here.
growMany=: 3 : 0 'maxwalk numpts pp'=. y pts=. numpts (]{~ ([<.[:#]) ?[:#]) calcPerim pp ctr=. _1 [ rr=. i.0 while. maxwalk>ctr=. >:ctr *. (0<#pts) do. wcc=. PTS chkCollision pts=. walk pts if. 1 e. wcc do. addPt wcc#pts pts=. pts#~-.wcc [ rr=. rr,ctr$~+/wcc end. end. rr,maxwalk$~#pts )
The three arguments to growMany are exactly those to do1; they are broken into named groups on the first line of this verb. The second line, where the release perimeter is calculated, is again best illustrated by first showing the sub-function calcPerim in action by itself. Its definition is thus:
calcPerim=: 3 : 0 NP=: PTS neigh2^:(0{y)]PTS edges=. ~.PTS-.~NP -.~ ,/NP+"1/RNI edges=. ~.PTS neigh2^:(1{y)]edges NB.EG PPTS_dla_=: calcPerim 5 2 )
We see that it directly references the global PTS noun in the first line, along with the first of its two inputs (0{y), as inputs to theneigh2 verb. We have already seen this verb as it was defined in the init code because it incorporates a global that depends on the dimensionality of our points. It was necessary to define this in the init function because it is an explicit verb, meaning, among other things, it resolves the global at the point of definition.
Here we see the value of neigh2, as well as its definition, along with the value of the global RNI'' that it incorporates.
neigh2=: [ -.~ [: ~. [: ,/ RNI +"1/~ ] neigh2 [ -.~ [: ~. [: ,/ (8 2$_1 _1 _1 0 _1 1 0 _1 0 1 1 _1 1 0 1 1) +"1/~ ] |:RNI _1 _1 _1 0 0 1 1 1 _1 0 1 _1 1 _1 0 1
This noun RNI, (which stands for Relative Neighbor Indexes) comprises the relative indexes of all the neighbors of a point in two-space, illustrated graphically here:
3 3$1 2 3 4 0 5 6 7 8{(<''),<"1 RNI +-----+----+----+ |_1 _1|_1 0|_1 1| +-----+----+----+ |0 _1 | |0 1 | +-----+----+----+ |1 _1 |1 0 |1 1 | +-----+----+----+
We can run the first line of calcPerim in isolation to see what it does.
y=. 5 2 NB. What y would have been… $NP=: PTS neigh2^:(0{y)]PTS NB. Array of 2D points 315 2 viewmat 1 bordFill NP NB. See what it looks like.
$1 bordFill edges,e2,NP NB. Determine which set of points 28 27 $1 bordFill edges,e2 NB. encompasses the largest bounding box 28 27 $1 bordFill edges NB. and sum the different sets to get all 24 23 $1 bordFill e2 NB. in one picture with distinctive colors. 28 27 viewmat (1 bordFill e2)+(1 bordFill edges,e2)+1 bordFill edges,e2,NP
Looking at the next few lines after this,
ctr=. _1 [ rr=. i.0 while. maxwalk>ctr=. >:ctr *. (0<#pts) do. wcc=. PTS chkCollision pts=. walk pts
we see a counter ctr and result set rr being initialized, the start of a while loop, and the check for a collision of any of the points after they’ve been walked one random step. The last two lines within the while loop take care of adding any collided points to the global PTS and removing them from the set being randomly walked. The result set rr keeps track of how many steps were required to add any new points by repeating the current value of the counter by the number of newly-added points at any step.
if. 1 e. wcc do. addPt wcc#pts pts=. pts#~-.wcc [ rr=. rr,ctr$~+/wcc
The final line of growMany returns the resulting new points to be added to the set as well as a count of how many released points reached the random walk limit without being added to the set.
rr,maxwalk$~#pts
The two sub-functions referenced above are simply this:
addPt=: 3 : 'PTS=: PTS,y' chkCollision=: 4 : '+./"1 (y+"1/RNI) e. x'"_
Looking again at the wrapper do1 for growMany, we can now better appreciate how it works:
do1 3 : 'tm,(#PTS),(sv-~#PTS),(tm%~sv-~#PTS),(usus nn),(#nn)%~nn+/ . =>./nn [ tm=. 6!:2 ''nn=. growMany y''[sv=. #PTS'
We see that we start, going from right to left, by saving the number of starting points, then we time how long growMany takes to give us the result nn which is the vector of random walk lengths. The major part of the line uses these results to produce statistics to give us an idea of how efficiently we’re adding points to the cluster.
A Problem with Three Dimensions
Notice how useful we found the viewmat verb in the preceding exposition. This was equally true during the development of this code. Being able to look at a picture of the points being generated helped guide development of these algorithms and provided assurance that we were doing sensible things. However, when run this code to produce a three-dimensional DLA, as shown here, we face the problem of visualizing these new results.
init 3 0 0 0 do1 125;20;5 2 0.0243368 3 2 82.18 17 125 114.25 33.0882 0.9 do1 125;20;5 2 0.0298845 5 2 66.9244 60 125 120.85 14.8759 0.9 … do1 150;100;5 2 0.14899 120 30 201.355 4 150 117.93 51.3275 0.7 $PTS 120 3
So, it looks like we have 120 three-dimensional points. What do they look like?
Searching the web for a tool with which to visualize this point set yielded little satisfaction. Ideally, we’d like not only to be able to see the points but to move the display in real-time to look at the DLA from different angles. There is some functionality of this type in the R environment, but it appeared to be somewhat clumsy to use and it requires moving the data into that environment as well.
I did find a set of three-dimensional visualization scripts called “Dex” – a “data explorer” – that is based on the popular Javascript graphing package “d3.js”. It didn’t do exactly what I wanted but appeared to be close. So, in just a couple of hours, I was able to put together something that worked like this for our sample set of points.
buildView PTS;(0$~#PTS);'sample3D120pts';'Sample of 120 three-dimensional points' 3072
This shows our points in a browser in a form easily rotated by moving the cursor around it, looking something like this:
Looking Ahead
The code is currently much cruder than I’d like, but it works fine for this case. In the future, I’d like to better integrate it into the JHS environment and produce the HTML more neatly but it’s fine for now. The J routines look like this:
buildView=: 3 : 0 'pts clrs flpfx title'=. 4{.y if. 0=#title do. title=. flpfx end. tmplt=. fread 'ScatterPlot3D.tmplt' tmplt=. (tmplt rplc '{title}';title) rplc '{thisScat3D}';flpfx,'.js' (tmplt rplc '{points}';cvt2d3Array pts) fwrite flpfx,'.htm' set0=. (<cSets) rplc &.> (<'{i.#pts}');&.>":&.>i.#pts set1=. ;set0 rplc&.> (<'{colorIx}');&.>":&.>clrs mystmplt=. fread 'ScatterPlot3D1_files/myScatterPlot3DJS.tmplt' subfns=. CR-.~mystmplt rplc '{colorsets}';set1 subfns fwrite 'ScatterPlot3D1_files/',flpfx,'.js' ) cvt2d3Array=: 3 : '}:_1|.;}.&.>;&.><"1 (<''],[''),.~'','',&.>j2n&.>y' rplc=: stringreplace NB. From stdlib j2n=: 'S'&J2StdCvtNums NB. Convert numbers between J J2StdCvtNums=: 3 : 0 NB. and "standard" representations. NB.* J2StdCvtNums: convert char rep of num from J to "Standard", or NB. vice-versa if left arg is 'J' or '2J'; optional 2x2 left argument allows NB. 2 arbitrary conversions: col 0 is "from", col 1 is "to" char. NB. Monadic case changes Standard numeric representation to J representation. (2 2$'-_Ee') J2StdCvtNums y : if. 'S'=x do. if. ' '={.,y do. y return. end. end. if. 0=#y do. '' return. end. pw16=. 0j16 NB. Precision width: 16 digits>. diffChars=. 2 2$'-_Ee' NB. Convert '-'->'_' & 'E'->'e' toStd=. -.'J'-:''$'2'-.~,x NB. Flag conversion J->Standard if. 2 2-:$x do. diffChars=x NB. if explicit conversion. elseif. toStd do. diffChars=. |."1 diffChars end. NB. Convert other way if. 0=1{.0$y do. NB. Numeric to character fmts=. (8=>(3!:0)&.>y){0,pw16 NB. Full-precision floats only y=. fmts":y NB. If this is too slow, go back end. y=. y-.'+' NB. EG 1.23e+11 is ill-formed & the wh=. y=0{0{diffChars NB. '+' is unnecessary. cn=. (wh#1{0{diffChars) (wh#i. $y)}y NB. Translate chars that need it wh=. y=0{1{diffChars NB. but leave others alone. cn=. (wh#1{1{diffChars) (wh#i. $cn)}cn if. -.toStd do. NB. Special handling -> J nums if. '%'e. cn do. NB. Convert nn% -> 0.nn cn=. pw16":0.01*".cn-. '%' end. cn=. cn-.',' NB. No ',' in J numbers end. cn NB.EG 'S' J2StdCvtNums _3.14 6.02e_23 NB. Convert J numbers to std rep )
The main buildView routine simply customizes a pair of HTML templates by inserting the points, converted into a form suitable for the Javascript routines, and some title information into the templates. The main template, named ScatterPlot3D.tmplt, is assumed to be in the current directory. This is where the data points and the display title are inserted.
The second argument to buildView is a vector of colors corresponding to the matrix of points to be shown. These colors, currently 17 of them, are defined in the other template, called myScatterPlot3DJS.tmplt and assumed to be in the subdirectory ScatterPlot3D1_files. The Javascript code to set the colors is more elaborate than desired and is generated by buildView to be inserted in this latter template. This second template is named to match the main HTML file being created and written to the sub-directory. The name of these files, rather the file name prefix, is designated by the third element of the argument to buildView.
The fourth and final argument to buildView is text to title the resulting display.
These two templates are about 101, and 341 lines long, respectively.