User:Brian Schott/Stemplot
Stemplots
For many data sets (with modest cardinality) a stemplot is a simple and clear alternative to a histogram
Essays/Histogram.
WikiPedia:Stemplot
File:Stemleaf2011.ijs download scripts here
stem =: <.@(%&10) :. (10&*) NB. generalized in stemGen below sort=: /:~ sortleaf=: |@/:~ NB. sort leaf stem =: <.@:(%&10) :(<.@: %~ ) leaf=: (* * 10&|@|)@] stemNub=: (10 * ~.@:stem) : ([ * ~.@:stem) SLtab=: stemNub ;"0 stem sortleaf each@</. leaf ]sample =: 20?.@$ 20 6 15 19 12 14 19 0 17 0 14 6 18 13 18 11 12 18 0 10 2 SLtab sortleaf sample +--+---------------------------+ |0 |0 0 0 2 6 6 | +--+---------------------------+ |10|0 1 2 2 3 4 4 5 7 8 8 8 9 9| +--+---------------------------+
The stemplot using SLtab shows that 6 of the random integers are between 0 and 9, the rest are between 10 and 19.
A slightly more attractive readout is achieved by the verb pretty. Also notice that SLtab is dyadic and can take 2, 5, or 10 as its left argument.
pretty =: (_5&{.@":@[,' | ',(1j0&":)@])&>/"1 pretty SLtab sort sample 0 | 000266 10 | 01223445788899 pretty 5 SLtab sort sample 0 | 0002 5 | 66 10 | 0122344 15 | 5788899
But as can be seen by the next data set taken from Wikipedia, gaps in the data can yield gaps in the stemplot (stems for 50 and 90 are missing).
pretty SLtab 44 46 47 49 63 64 66 68 68 72 72 75 76 81 84 88 106 40 | 4679 60 | 34688 70 | 2256 80 | 148 100 | 6
Additional features in SL eschew the gaps and deal with negative values, and distributing 0s in data when needed.
d =: {: - {. rmonad=: 10 * ({: |.@:- i.@>:@d) @stem rdyad=: [([ * ({: |.@:- i.@>:@d@])@stem)] r =: rmonad : rdyad NB. range of stems tf =: >@{."1@] NB. take first and open df =: }."1@] NB. drop first fsg=: <"0@([r tf) ,. ,.@(([(r e. ])tf) expand&, df) NB. fill stem gaps SLgapless=: 10&$: : ([ fsg sort @ SLtab) neg0=: [(":@]`('_'&,@":@|@:+)@.(0>])"0) tf NB. recalc neg stems stemClean=: 10&$: : (<"1@neg0,.df) SL=: 10&$: :( [ stemClean balance0s@SLgapless) pretty SL 44 46 47 49 63 64 66 68 68 72 72 75 76 81 84 88 106 40 | 4679 50 | 60 | 34688 70 | 2256 80 | 148 90 | 100 | 6
Also notice that SL is dyadic and can take 2, 5, or 10 as its left argument. Any zeroes in data that also contains both positive and negative data values is problematic because two zero stems are required: one positive and one negative (yes, _0). Furthermore, a decision regarding to which stem each zero value is assigned, must be made. Here we distribute an even number of zeroes equally between the two stems and the positive stem is favored if an odd number of zero data values is provided.
NB.* balance0s v monad NB. When there are negative data and multiple values of 0 NB. in the data, then the 0's need to be distributed NB. between the two stems 0 and _0. This verb does NB. that distribution, giving a slight bias to 0. NB. The argument is a stemplot in boxes containing integers balance0s=: monad define if. 0<:<./ tf y do. y return. end. if. 0>>./ tf y do. y return. end. z=. 0 i.~ tf y if. 1>:n=.+/0=k=.>(<z,1){y do. y return. end. t=. y m=.<.-:n NB. number of zeros to move t=. (<m}.k) (<z,1)}t t=. ((<m#0) ,~&.>(<(z-1),1){t) (<(z-1),1)}t )
To demonstrate we use data from Wikipedia which is rounded off before it is plotted.
round =: <.@(0.5&+) rpl3=: 4 : 0 NB. 'replace' from jforum 'x0 x1'=. x ((x1,a.) {~ (x0,a.) i. ]) y ) NB. examples taken from Wikipedia entry for stemplot cleanup =: [:round('-_'& rpl3)&.": ]wikidata =: cleanup '-23.678758, -12.45, -3.4, 4.43, 5.5, 5.678, 16.87, 24.7, 56.8' _24 _12 _3 4 6 6 17 25 57
pretty SL wikidata, 0 0 _20 | 4 _10 | 2 _0 | 30 0 | 0466 10 | 7 20 | 5 30 | 40 | 50 | 7
NB. sample data sets a =: 25 64 31 26 20 b =: 1 5 2 3 9 10 3 c =: _3 _1 5 3 9 10 2 19
Note 'demos' SL a SL b SL c SL c, 0 0 SL c, 0 0 0 5 SL c 2 SL c SL wikidata )
I want to acknowledge Keith Smillie's fine work on stem-and-leaf plots from which I
have borrowed extensively.
WikiPedia:Stemplot