NYCJUG/2018-05-08
Beginner's Regatta
Converting Numbers To and From Character Representation
We explore a basic function to convert between numeric values and their character representation in J and vice versa.
J2StdCvtNums=: 3 : 0 NB.* J2StdCvtNums: convert char rep of num from J to "Standard", or NB. vice-versa if left arg is 'J' or '2J'; optional 2x2 left argument allows NB. 2 arbitrary conversions: col 0 is "from", col 1 is "to" char. NB. Monadic case changes Standard numeric representation to J representation. (2 2$'-_Ee') J2StdCvtNums y : if. 'S'=x do. if. ' '={.,y do. y return. end. end. NB. Already done? if. 0=#y do. '' return. end. pw16=. 0j16 NB. Precision width: 16 digits>. diffChars=. 2 2$'-_Ee' NB. Convert '-'->'_' & 'E'->'e' toStd=. -.'J'-:''$'2'-.~,x NB. Flag conversion J->Standard if. 2 2-:$x do. diffChars=x NB. if explicit conversion. elseif. toStd do. diffChars=. |."1 diffChars end. NB. Convert other way if. 0=1{.0$y do. NB. Numeric to character fmts=. (8=>(3!:0)&.>y){0,pw16 NB. Full-precision floats only y=. fmts":y NB. If this is too slow, go back end. y=. y-.'+' NB. EG 1.23e+11 is ill-formed & the wh=. y=0{0{diffChars NB. '+' is unnecessary. cn=. (wh#1{0{diffChars) (wh#i. $y)}y NB. Translate chars that need it wh=. y=0{1{diffChars NB. but leave others alone. cn=. (wh#1{1{diffChars) (wh#i. $cn)}cn if. -.toStd do. NB. Special handling -> J nums if. '%'e. cn do. NB. Convert nn% -> 0.nn cn=. pw16":0.01*".cn-. '%' end. cn=. cn-.',' NB. No ',' in J numbers end. cn NB.EG 'S' J2StdCvtNums _3.14 6.02e_23 NB. Convert J numbers to std rep )
With a little bit of testing this looks OK but with more examples we see some issues.
'S' J2StdCvtNums _3.14 6.02e_23 NB. Convert J numbers to std rep -3.1400000000000001 0.0000000000000000 NB. Didn't stop - why?
It looks like we need to insert some breakpoints and look at intermediate values to track down what's causing this behavior.
13!:3 'J2StdCvtNums : 0 9 17 21 22 24 26 28' NB. Dyadic stops 13!:0]1 NB. Debug on 'S' J2StdCvtNums _3.14 6.02e_23 NB. Convert J numbers to std rep |stop: J2StdCvtNums | 'S'=x |J2StdCvtNums[:0] y _3.14 6.02e_23 x S
This looks OK so far, so proceed to the next stop.
13!:4'' |stop | ' '={.,y |J2StdCvtNums[:0] NB. Same line, multiple statements 13!:4'' |stop | fmts=.(8=>(3!:0)&.>y){0,pw16 |J2StdCvtNums[:9] y _3.14 6.02e_23
Still looking good, proceed.
13!:4'' |stop | y=.y-.'+' |J2StdCvtNums[:17] y _3.1400000000000001 0.0000000000000000
This is where we see the start of the problem. If we proceed from here, we see that these extra trailing zeros don't get fixed.
13!:4'' |stop | cn=.(wh#1{1{diffChars)(wh#i.$cn)}cn |J2StdCvtNums[:21] cn -3.1400000000000001 0.0000000000000000 13!:4'' |stop | cn |J2StdCvtNums[:28] cn -3.1400000000000001 0.0000000000000000 13!:4'' -3.1400000000000001 0.0000000000000000
Let's define a couple of functions: one to remove trailing zeros after the decimal point and another to remove any trailing decimal point.
rmTrailing0sAfterPoint=: ] #~ [: -. ([: *./\&.|. '0' = ]) *. [: +./\ '.' = ] rmTrailingPoint=: ] }.~ [: - '.' = {: 13!:0]0 NB. Turn off debugging 'S' J2StdCvtNums _3.14 6.02e_23 NB. Convert J numbers to std rep -3.1400000000000001 0.0000000000000000
Now try our new utilities on this result.
rmTrailing0sAfterPoint 'S' J2StdCvtNums _3.14 6.02e_23 -3.1400000000000001 0. rmTrailingPoint rmTrailing0sAfterPoint 'S' J2StdCvtNums _3.14 6.02e_23 -3.1400000000000001 0
It's still not clear exactly what to do with the trailing almost-zeros "00000000000001". We could drop a final digit if there are more than a certain number of digits past the decimal but there is a corresponding problem of numbers ending in strings like "9999999999999" for which this does not work.
Show and Tell
Words with Letters Typed by Alternating Hands
This exercise was inspired by the idea that there are some words potentially faster to type because adjacent letters are typed with the fingers on opposite hands. For instance, the word sigh is typed left (s), right (i), left (g), right (h). These would seem like good words to form the basis of something like a password, making it easier to type more quickly with fewer errors.
Getting a List of Words
To do this, first we need a large list of words to evaluate. [Since the following URL is no longer valid, you could look here for a good, long list of English words.]
The following site has/used to have lists arbitrarily broken into separate files, so let’s get all of them.
urlTemplate=. 'http://www.manythings.org/vocabulary/lists/l/words.php?f=noll{n}' ]nn=. 2 lead0s&.>>:i.15 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |01|02|03|04|05|06|07|08|09|10|11|12|13|14|15| +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
There are 15 separate files. We can build a list of wget commands to retrieve the URLs we generate from this template appended with the 15 suffixes.
cmds=. (<'wget -O '),&.>(nn,~&.><'TopAmericanEnglish'),&.>(<'.htm '),&.>(<urlTemplate) rplc&.><"1 (<'{n}'),.nn
Check two of the commands to see if they look right:
>2{.cmds wget -O TopAmericanEnglish01.htm http://www.manythings.org/vocabulary/lists/l/words.php?f=noll01 wget -O TopAmericanEnglish02.htm http://www.manythings.org/vocabulary/lists/l/words.php?f=noll02
Now run them and check how long it takes to retrieve them all.
6!:2 'shell&.>cmds' 5.25957
Check that we have files we're expecting in our local directory.
$dir 'Top*.htm' 15 5 0{dir 'Top*.htm' +------------------------+-----------------+----+---+------+ |TopAmericanEnglish01.htm|2018 3 9 10 15 51|8416|rw-|-----a| +------------------------+-----------------+----+---+------+
Now extract the words we want from among the HTML tags by keying on strings preceding and following the payload:
locStr=. '<div class="wrapco"><div class="co"><ul>' endStr=. '</ul></div><br />' flnms=. 0{"1 dir 'Top*.htm'
First get our extraction working with a single file:
+/locStr E. fread >0{flnms 1 +/endStr E. fread >0{flnms 1 I. ((locStr E. ])+.endStr E. ])fread >0{flnms 4974 7518
Now that we have start and end locations
ss=. I. ((locStr E. ])+.endStr E. ])fread >0{flnms -~/ss+(#locStr),0 2504 len=. -~/ss+(#locStr),0 $words=. (len{.({.ss+#locStr)}.]) fread >0{flnms 2504 50{.words <li>about</li><li>after</li><li>again</li><li>air<
The words we want are wrapped in list tags, so remove those.
+/'</li><li>' E. '</li>',words,'<li>' 185 words=. '</li>',words,'<li>' ptnStr=. '</li><li>' (ptnStr E. words)<;.1 words +--------------+--------------+--------------+------------+------------+---... |</li><li>about|</li><li>after|</li><li>again|</li><li>air|</li><li>all|</l... +--------------+--------------+--------------+------------+------------+---... _3{.(ptnStr E. words)<;.1 words +-------------+------------+---------+ |</li><li>your|</li><li>was|</li><li>| +-------------+------------+---------+ (#ptnStr)}.&.>}:(ptnStr E. words)<;.1 words +-----+-----+-----+---+---+-----+----+--+---+-------+---+---+------+--+--+-... |about|after|again|air|all|along|also|an|and|another|any|are|around|as|at|a... +-----+-----+-----+---+---+-----+----+--+---+-------+---+---+------+--+--+-... $words rplc '</li></ul></div><div class="co"><ul><li>';ptnStr 2420 words=. words rplc '</li></ul></div><div class="co"><ul><li>';ptnStr (#ptnStr)}.&.>}:(ptnStr E. words)<;.1 words +-----+-----+-----+---+---+-----+----+--+---+-------+---+---+------+--+--+-... |about|after|again|air|all|along|also|an|and|another|any|are|around|as|at|a... +-----+-----+-----+---+---+-----+----+--+---+-------+---+---+------+--+--+-... #(#ptnStr)}.&.>}:(ptnStr E. words)<;.1 words 187
Now that we have some working code, we can combine the good lines from above to write our function:
extractWords=: 3 : 0 locStr=. '<div class="wrapco"><div class="co"><ul>' endStr=. '</ul></div><br />' flnms=. 0{"1 dir 'Top*.htm' fl=. fread y ss=. I. ((locStr E. ])+.endStr E. ]) fl len=. -~/ss+(#locStr),0 words=. (len{.({.ss+#locStr)}.]) fl words=. '</li>',words,'<li>' ptnStr=. '</li><li>' words=. words rplc '</li></ul></div><div class="co"><ul><li>';ptnStr (#ptnStr)}.&.>}:(ptnStr E. words)<;.1 words ) $allww=. ;extractWords &.> flnms 2123
Let's take a look at some of the words.
10{.allww +-----+-----+-----+---+---+-----+----+--+---+-------+ |about|after|again|air|all|along|also|an|and|another| +-----+-----+-----+---+---+-----+----+--+---+-------+ _10{.allww +---+----+-------+----+----------+-----+-------+------+----+---+ |toy|trap|treated|tune|University|vapor|vessels|wealth|wolf|zoo| +---+----+-------+----+----------+-----+-------+------+----+---+
Take some statistical measure of the word lengths.
load 'mystats' allww=. tolower&.>allww allww=. ,&.>allww NB. Don't want single characters to be scalars. usus szs=. #&>allww 1 14 5.5822 1.87302 I. szs = 14 1735 1769
The longest is 14 letters and there are two of these. What are they?
allww{~I. szs = 14 +--------------+--------------+ |transportation|characteristic| +--------------+--------------+
Categorize Letters by the Hand with Which Each is Typed
Write and test our indicator that a word is composed of letters typed with alternating hands.
13 : '*./0 1*./ . =&>({.&> /:~ ]) @: (] (# ; ] #~ [: -. [)~ 1 0 $~ #) y' [: *./ 0 1 *./ .=&> ({.&> /:~ ])@:(] (# ; ] #~ [: -. [)~ 1 0 $~ #) isAlternating=: [: *./ 0 1 *./ .=&> ({.&> /:~ ])@:(] (# ; ] #~ [: -. [)~ 1 0 $~ #) isAlternating&.>(1 0 1 0);(0 1 0 1);(1 0 1);(0 1 0);(0 1 1 0);(1 0 0 1) NB. First 4 are good, last 2 are bad. +-+-+-+-+-+-+ |1|1|1|1|0|0| +-+-+-+-+-+-+ isAlternating&>(1 0 1 0);(0 1 0 1);(1 0 1);(0 1 0);(0 1 1 0);(1 0 0 1) NB. First 4 are good, last 2 are bad. 1 1 1 1 0 0
So, isAlternating seems to indicate when we have only alternate ones and zeros in a vector, so let's build the two lists of letters for each hand.
LRsets=. 'qwertasdfgzxcvb';'yuiophjklnm' +/whAlt=. isAlternating&>(list=. <;._2 CR-.~fread 'wordsEn.txt') e.&.>{.LRsets 1530 new1s=. whAlt#list ]whAlt=. isAlternating&>(list=. <;._2 CR-.~fread 'corncob_lowercase.txt') e.&.>{.LRsets 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... +/whAlt 909 new1s=. new1s,whAlt#list $new1s=. /:~~.new1s 1578 new1s#~2=#&>new1s NB. Look at two-letter words: worth keeping? +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+-... |ah|ai|al|am|an|ay|by|cl|co|do|dp|eh|el|em|en|go|ha|he|hr|ic|id|ie|if|is|i... +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+-... #new1s=. new1s#~2<#&>new1s NB. No. 1506 usus szs=. #&>lra 2 13 5.1105 1.7776 lra#~szs=13 +-------------+-------------+ |dismantlement|ichthyosiform| +-------------+-------------+ lra#~szs=10 +----------+----------+----------+----------+----------+----------+-------... |antisocial|auditorial|cochairman|cochairmen|dickensian|enamelwork|ichthyi... +----------+----------+----------+----------+----------+----------+-------... usus szs=. #&>new1s 3 13 5.249 1.76259 new1s#~szs=13 +-------------+-------------+ |dismantlement|neurotoxicity| +-------------+-------------+ new1s#~szs=10 +----------+----------+----------+----------+----------+----------+-------... |antisocial|auditorial|cochairman|cochairmen|dickensian|enamelwork|malaysi... +----------+----------+----------+----------+----------+----------+-------... LRsets=. 'qwertasdfgzxcvb~`12345!@#$%';'yuiophjklnm67890-_=+^&*();:''",./<>?'
We have extended the two sets to include numerals and punctuation. Our vector LRsets is two elements corresponding to whether the set of characters corresponds to the left or right hand, respectively.
]whAlt=. isAlternating&>list e.&.>{.LRsets 0 0 0 1 0 0 0 0 1 0 1 1 +/whAlt 4 whAlt#list +------+-----+----+------+ |biform|boric|both|bushel| +------+-----+----+------+
Take a look at the larger list new1s from above.
#new1s 1506 3{.new1s +---+----+-----+ |aha|ahem|ahems| +---+----+-----+ _3{.new1s +-----+------+-------+ |zowie|zurich|zygotic| +-----+------+-------+
Sort primarily by length.
byLen=. new1s /: #&>new1s 5{.byLen +---+---+---+---+---+ |aha|ahs|aid|air|ala| +---+---+---+---+---+ _5{.byLen +-----------+------------+------------+-------------+-------------+ |proficiency|authenticity|proamendment|dismantlement|neurotoxicity| +-----------+------------+------------+-------------+-------------+ hdr=. 'Words whose letters are entered by alternating use of the left and right hands on a QWERTY keyboard. All have more than two letters and are sorted alphbetically within ascending word length.' (hdr,LF,;LF,~&.>byLen) fwrite '../txt/LRalternatingHandsWordsBySize.txt' 9602
An Enhancement
Since this restriction of alternating hands is a very stringent filter, giving us fewer than 2,000 words, we subsequently loosened the requirement to allow words with duplicate adjacent letters, giving us a list of more than 3,000 words. The complete code is attached in the Materials section below.
Advanced Topics
Calculating and Showing Maximum Drawdown
Maximum drawdown is a concept in money management that quantifies the history of an investment by looking at the worst loss one would have sustained over an historical period.
As defined [from http://www.investopedia.com/terms/m/maximum-drawdown-mdd.asp here]:
DEFINITION of 'Maximum Drawdown (MDD)'
The maximum loss from a peak to a trough of a portfolio, before a new peak is attained. Maximum Drawdown (MDD) is an indicator of downside risk over a specified time period. It can be used both as a stand-alone measure or as an input into other metrics such as "Return over Maximum Drawdown" and Calmar Ratio. Maximum Drawdown is expressed in percentage terms and computed as:(Trough Value – Peak Value) ÷ Peak Value
BREAKING DOWN 'Maximum Drawdown (MDD)'
Consider an example to understand the concept of maximum drawdown.
Assume an investment portfolio has an initial value of $500,000. The portfolio increases to $750,000 over a period of time, before plunging to $400,000 in a ferocious bear market. It then rebounds to $600,000, before dropping again to $350,000. Subsequently, it more than doubles to $800,000. What is the maximum drawdown?
The maximum drawdown in this case is = ($350,000 – 750,000) / $750,000 = –53.33%
Note the following points: • The initial peak of $750,000 is used in the MDD calculation. The interim peak of $600,000 is not used, since it does not represent a new high. The new peak of $800,000 is also not used since the original drawdown began from the $750,000 peak. • The MDD calculation takes into consideration the lowest portfolio value ($350,000 in this case) before a new peak is made, and not just the first drop to $400,000.
MDD should be used in the right perspective to derive the maximum benefit from it. In this regard, particular attention should be paid to the time period being considered. For instance, a hypothetical long-only U.S. fund Gamma has been in existence since 2000, and had a maximum drawdown of -30% in the period ending 2010. While this may seem like a huge loss, note that the S&P 500 had plunged more than 55% from its peak in October 2007 to its trough in March 2009. While other metrics would need to be considered to assess Gamma fund's overall performance, from the viewpoint of MDD, it has outperformed its benchmark by a huge margin.
Calculating Maximum Drawdown in J
How might we calculate this in J in an array-oriented fashion? Let’s start in the middle: assume we have the peak and trough values:
((>./-<./)%>./) 2108.63 1867.61 NB. Max drawdown 0.114302
That’s the easy part. The harder part is finding the correct peak and trough for a given set of numbers.
Taking the numbers from the example above:
vals=. 10000* 50 75 40 60 35 80 (<./,>./) vals 350000 800000
We know these aren’t the right numbers. We need to find the peak at each point in the series. This will get us close:
>./\vals 500000 750000 750000 750000 750000 800000
Now we can find the location of each new peak:
]whNewPeak=. 2</\>./\vals 1 0 0 0 1 #whNewPeak 5 #vals 6
There’s a problem: due to the nature of comparing pairs of items, our result will always be one shorter than our starting vector but we’d like them to line up so we have to make a decision. Fortunately, the choice seems clear: we’ll assume that the first value counts as a peak under the reasoning that it is the highest value “so far”. So,
#whNewPeak=. 1,whNewPeak 6
It seems natural, with an eye to the next step of finding the minimum in each intra-peak section, to use whNewPeak as a partition vector:
whNewPeak <;.1 vals +------+---------------------------+------+ |500000|750000 400000 600000 350000|800000| +------+---------------------------+------+
This allows us to find the minimum of each interval along with its corresponding peak:
<./&>whNewPeak<;.1 vals 500000 350000 800000 whNewPeak#vals 500000 750000 800000
So, finding the maximum drawdown should be relatively straightforward: we simply find the maximum difference:
>./(whNewPeak#vals) - <./&>whNewPeak<;.1 vals 400000
The Code
However, there is a complication: we have to associate this number with its corresponding peak. Also, upon reflection, it would be useful to keep track of the date associated with each of these values so we can show our work.
maxDrawdown=: 3 : 0 whNewPeak=. 1,2</\>./\y md=. (whNewPeak#y) - <./&>whNewPeak<;.1 ] y nfp=. 0={:whNewPeak NB. Not final peak (no peak at end) md0=. (-nfp)}.md%whNewPeak#y NB. Drop last if didn't end on new peak wsp=. md0 i. >./md0 NB. Where's starting peak of max drawdown? spix=. (wsp+0 1){I. whNewPeak NB. Start, end index of max drawdown span=. y{~(<./ + [: i. [: >: [: | -/) spix wmin=. (<./spix)+span i. <./span NB. Where minimum was in span md=. (>./md0);spix;wmin NB.EG 'md whsp whmin'=. maxDrawdown 500 750 400 600 350 800 NB.EG 'md whsp whmin'=. maxDrawdown vals=. 100 150 90 120 80 200 )
Checking our work:
vals{~/:~whsp, whmin 750000 350000 800000 ]'md whsp whmin'=. maxDrawdown 500 750 400 600 350 800 +--------+---+-+ |0.533333|1 5|4| +--------+---+-+ vals{~/:~whsp, whmin 750000 350000 800000
For a fuller example of usage, let's pull in about a year's worth of price data for the S&P 500 index.
'tit1 sp500ix'=: split <;._1&>TAB,&.><;._2 ] LF (] , [ #~ [ ~: [: {: ]) CR-.~0 : 0 Date S&P 500 Index - Index Levels - Index Value - USD 01/06/2015 2002.613587 01/07/2015 2025.90105 01/08/2015 2062.143554 01/09/2015 2044.8099 … 01/04/2016 2012.659371 01/05/2016 2016.714426 01/06/2016 1990.262292 ) tit1 +----+------------------------------------------------+ |Date|S&P 500 Index - Index Levels - Index Value - USD| +----+------------------------------------------------+ 'dts vals'=. <"1 |:sp500ix vals=. ".&>vals ]'md whsp whmin'=. maxDrawdown vals +---------+-----+--+ |0.0364402|37 75|44| +---------+-----+--+ sp500ix{~/:whsp,whmin +----------+-----------+ |01/06/2015|2002.613587| +----------+-----------+ |01/08/2015|2062.143554| +----------+-----------+ |01/07/2015|2025.90105 | +----------+-----------+
Graphing Drawdown
This is a concept that lends itself well to graphical display.
First, let’s get even more real-life data.
load 'dsv' NB. Delimiter-Separated Values $googPxs=. (',';'') readdsv 'GOOG.csv' NB. Downloaded from Yahoo Finance... 3411 7 3{.googPxs +----------+---------+---------+---------+---------+---------+--------+ |Date |Open |High |Low |Close |Adj Close|Volume | +----------+---------+---------+---------+---------+---------+--------+ |2004-08-19|49.676899|51.693783|47.669952|49.845802|49.845802|44994500| +----------+---------+---------+---------+---------+---------+--------+ |2004-08-20|50.178635|54.187561|49.925285|53.805050|53.805050|23005800| +----------+---------+---------+---------+---------+---------+--------+ 'tit gpxs'=. split googPxs tit +----+----+----+---+-----+---------+------+ |Date|Open|High|Low|Close|Adj Close|Volume| +----+----+----+---+-----+---------+------+ clsPxs=. ".&>gpxs{"1~tit i. <'Close' $clsPxs 3410 clsPxs 49.8458 53.8051 54.3465 52.0962 52.6575 53.6063 52.732 50.6754 50.8542 49.8011 50.427 49.6819 50.4618 50.8195 50.8244 52.3247 53.4027 55.3848 55.6381 56.6168 58.3654 59.2943 58.5393 58.8075 60.0196 59.5278 58.7479 63.0201 65.1165 64.3813 65.8616 67.0936 68...
Now that we have these thousands of closing prices, let's apply the drawdown calculation and graph it in a useful fashion.
load 'plot' 'tit gpxs'=. split (',';'') readdsv 'GOOG.csv' NB. Downloaded from Yahoo! finance... clsPxs=. ".&>gpxs{"1~tit i. <'Close' 'md whsp whmin'=. maxDrawdown clsPxs md;whsp;whmin +--------+--------+----+ |0.652948|810 2040|1075| +--------+--------+----+ tit,gpxs{~/:~whmin,whsp +----------+----------+----------+----------+----------+----------+--------+ |Date |Open |High |Low |Close |Adj Close |Volume | +----------+----------+----------+----------+----------+----------+--------+ |2007-11-06|366.396942|368.498260|360.157532|368.498260|368.498260|16982200| +----------+----------+----------+----------+----------+----------+--------+ |2008-11-24|133.760025|134.102783|123.700447|127.888214|127.888214|20240100| +----------+----------+----------+----------+----------+----------+--------+ |2012-09-24|363.138123|372.596619|362.765564|372.268738|372.268738|7173800 | +----------+----------+----------+----------+----------+----------+--------+ 127.888214%368.498260 NB. Lowest in drawdown as portion of starting point. 0.347052 -.127.888214%368.498260 NB. Drawdown is how far down this took us at that point. 0.652948 'pensize 2' plot clsPxs,:>./\clsPxs
After drawing and writing on the result of this plot, we get the following.
Learning and Teaching J
The Dirichlet Linkage
We took a look at this paper which is only available in PDF (see Materials below for a copy).
The math is fairly advanced but J code is included, along with interesting pictures, to soften the rigor.
The preface looks like this:
At the Open Day on 25–Nov–2017, Jane Ball asked if I might use my maths skills to create art. Here is a sketch of one possibility, taking ‘an active line on a (mathematical) walk’ [4]. The pictures start again on page 10!
Materials
- File:LRAltWords.ijs J code LRALtWords.ijs as referenced above
- File:Dirlink.pdf The Dirichlet Linkage by Ewart Shaw