NYCJUG/2012-03-14/QuickAndDirtyAnalysiswJ
[This example formed the basis of a discussion at NYCJUG about teaching J by presenting examples of simple, practical things we can do with the language.]
In this example of using J to do simple analysis of financial data, we start with a newspaper article and get data about the subject of the article (the VIX - volatility index) in a form amenable to analysis and check some of the article's assertions.
Quick and Dirty Data Analysis with J
WSJ Article
March 13, 2012, 10:36 AM
VIX Flirts With Nearly Five-Year Low
By Steven Russolillo
Yesterday we pointed out the CBOE’s volatility index, VIX, had slumped to its lowest close since April. Today, its free-fall continues and is now flirting with its lowest level in nearly five years. VIX ..was recently down 6.2% at 14.67 and earlier dropped as low as 13.99. Any move below 14.62 would mark the lowest level since June 2007. The VIX moved back above 20 — roughly its long-term average — as recently as early last week when stocks notched their biggest one-day loss of the year. But the pickup didn’t last long; VIX is down 30% since last Tuesday.
...
Discussion
Finding data on the VIX and related, tradable instruments, we see, on the left, the symbol for the index as well as numerous ETFs by which we can achieve exposure to it. Since all these ETFs are related to the index, let’s look at its data first.
Clicking on “^VIX” shown above, then on the “Historical Prices” selection on the left pane (below) brings us to a screen like the one below.
The examples on the following page show how we might use J to look at the data in the table here after we’ve downloaded it to a .csv file from the Yahoo Finance site. First, we’ll read it from the file and assign the columns in which we’re interested to some variable names. Then we’ll examine a few items of interest.
Starting Analysis in J
load 'tables/csv' NB. Utilities for reading in delimited files… 'vxt vxp'=. split readcsv 'pxVIX19900102-20120312.csv' $vxp 5592 7 vxt NB. Titles label columns +----+----+----+---+-----+------+---------+ |Date|Open|High|Low|Close|Volume|Adj Close| +----+----+----+---+-----+------+---------+ vxt=. -.&' ' &.> vxt NB. drop spaces from Titles (vxt)=. <"1 |: vxp NB. each column assigned to column label as variable name datatype &.> AdjClose NB. Check that closing price is character +-------+-------+-------+-------+-------+-------+-------... |literal|literal|literal|literal|literal|literal|literal... +-------+-------+-------+-------+-------+-------+-------... $AdjClose=. _ ". > AdjClose NB. Turn into simple numeric vector. 5592
We can verify that these first few prices match those in the "Adj Close" column of the table shown above.
10{.AdjClose NB. Look at some values. 15.64 17.11 18.02 19.07 20.84 18.05 17.29 17.26 18.43 17.95
Using "grade-up" (/:) to give us the index into our price vector of the lowest (to highest) values so we can see the lowest this price has ever been and when that was.
/:AdjClose NB. Indexes of lowest prices 4586 4585 4584 4583 1293 1334 4560 1318 1335 1276 1317 1336 1286 4587...
These indexes seem to fall roughly into three groups: about 4580, 1290, and 1330. Take a sample from each of these three and see to which dates they correspond.
0 4 5 { /:AdjClose NB. Pick a few from different groups. 4586 1293 1334 Date {~ 0 4 5 { /:AdjClose NB. See dates for these low points. +----------+----------+----------+ |1993-12-22|2007-01-24|2006-11-21| +----------+----------+----------+ AdjClose {~ 0 4 5 { /:AdjClose NB. Prices on those dates 9.31 9.89 9.9
Let's compare the most recent price to all the others to see how many it exceeds and how many exceed it.
({.AdjClose)+/ . < }.AdjClose NB. Most recent is less than how many? 3892 ({.AdjClose)+/ . >: }.AdjClose NB. Most recent is greater than (or =) how many? 1699 14.59 ((+/ . <) , +/ . >) AdjClose NB. Same comparisons for price today 4212 1374
Side-tracking for a moment from the actual analysis, we look at a few ways to re-write the last J expression here to remove the apparent redundancy of the repeated summations (+/).
+/ &> 14.59 (< ; >) AdjClose NB. Examples of removing redundancy 4212 1374 NB. from the preceding expression. +/ 14.59 (< ,. >) AdjClose NB. Nicer because shorter, does not enclose 4212 1374
A tacit version is one without explicit names - it consists only of verbs.
13 : '+/ x (< ,. >) y' NB. Have J generate the tacit equivalent. [: +/ < ,. > 14.59 ([: +/ < ,. >) AdjClose 4212 1374 14.59 (< ,&(+/) >) AdjClose NB. Another tacit alternative 4212 1374
Now, check some of the claims in the article. First, we see how many times the adjusted closing price has exactly equaled the value of 14.62 mentioned in the article.
AdjClose +/ . = 14.62 NB. Check the article’s assertion about a price 6 NB. below 14.62 marking the lowest level since NB. June 2007.
Since this has happened six times, we can't find the first point less than this number by looking for the number itself: we need a slightly more complex set of instructions.
>Date{~AdjClose i. 14.62 NB. Date on which price last equalled 14.62. 2011-04-28 NB. This is when it was last equal to 14.62 but >Date{~ 1 i.~ AdjClose<14.62 NB. he said “below” – so when is the most 2007-06-21 NB. recent time it was less than 14.62?
This last expression, used to find the first instance in our series less than 14.62, deserves a more detailed explanation.
We want to find the first case the price was less than 14.62 because our prices are in date order with the most recent date first, so, by searching the vector of prices from start to end (the usual direction implicit in array operations in J), we're starting at the present and moving into the past.
The expression AdjClose<14.62 generates a boolean with zeros where the comparison is false and ones where it's true. The part of the expression 1 i.~ looks up (i.) the first occurrence of a one. The tilde reverses the order of the arguments, so instead of using parentheses like this - (AdjClose<14.62) i. 1 - we avoid them. Similarly, using the location of the first one returned by this part of the expression, we extract the corresponding member of the date vector - Date {~ - again using the tilde to avoid parentheses.
Finally, we disclose (>) the contents of the boxed date to show it more simply, without the box drawn around it.
mean AdjClose NB. Claimed long-term average “about” 20 20.5552 mean NB. Entering a name with no argument shows its +/ % # NB. definition, here, a classic tacit expression
-- Devon McCormick <<DateTime>>