NYCJUG/2012-06-12
weighted median, box cut example, simple shape, canonical projection, a-periodic tiles, Javascript as target language, bilingualism's effect on brain, language affects thought, R programming, examples of introduction to programming, game-based learning
Location:: Heartland
Agenda
Meeting Agenda for NYCJUG 20120612 ---------------------------------- 1. Beginner's regatta: comparison and experiment with weighted median - see "Elegant SQL - weighted median.pdf" & follow-up from last month - see "Using box cut data.pdf". Why do we have to continue to put up with new languages that do such a crappy job of array-handling? See "A Conversation About Shape.pdf". 2. Show-and-tell: More on JHS: see "Hello World demo for JHS.pdf". Aperiodic tiles - see excerpt from "JournalOfJ_April2012.pdf" and "canonicalProjectionIJS.pdf". 3. Advanced topics: See "Teaching code.pdf" and "Javascript Performance Compared.pdf". Affect of language on thinking: see "Bilingual brain boost.pdf" and "who-dunnit-Crosslinguistic Differences in Memory.pdf". See "Why does one language succeed and another one fail.pdf". 4. Learning, teaching and promoting J, et al.: See "First Thoughts on R.pdf", "Example Prelim Intros to Programming Languages.pdf", "The State of Games in the Classroom.pdf", and "Programming language board game.pdf".
Beginner's regatta
Weighted Median - Calculation and Pitfalls
We looked at an example of elegant SQL code for computing a weighted median, then at some J code to compute median. The details of a follow-on to the original (SQL) request for a weighted median calculation closely echoed a request I was looking at for work, so we explored it in a little detail.
The problem was not simply to find the median of a series of values weighted by another series but to use the division created by the median value to partition another related dataset. For example, we might be finding the median of market-cap - which is a company's number of shares outstanding weighted by their market price - then comparing the companies with below-median market-cap to those with above-median market-cap on some other measure like an earnings-to-price ratio.
In the J example of working out this problem, we saw how we might simply modify the median calculation to return the index (or indexes, in the case of an even number of values) of the median item (or items). So, we revamped this definition
median=: -:@(+/)@((<. , >.)@midpt { /:~)
to this one:
medianPosn=: [:~.(<. , >.)@midpt { /:
This latter definition does not average the two (possibly distinct) midpoints - -:@(+/)@ - and works on the grade vector - /: - rather than the sorted list - /:~ . However, some testing of this new definition revealed a conceptual shortcoming:
(wts*vals);(wts=. >:10 ?@$ 9);vals=. >:10 ?@$ 5 +------------------------+-------------------+-------------------+ |9 15 10 5 20 5 8 6 16 32|3 5 2 5 4 5 8 2 4 8|3 3 5 1 5 1 1 3 4 4| +------------------------+-------------------+-------------------+ median wts * vals 9.5 medianPosn wts * vals 0 2
Note that the pair of median positions are not adjacent. This points up an implicit condition of this notion of applying a median on one set of values to another set of related values: the items are implicitly ordered by the weighted values on which the median is calculated.
So, to use this concept properly, in J we might do something like this:
gv=. /: wts * vals medpt=. -:+/medianPosn gv { wts * vals belowMedian=. (/:gv){medpt>i.#gv NB. Boolean to select items below weighted measure belowMedian 1 0 0 1 0 1 1 1 0 0 belowMedian#wts*vals NB. Verify that we get the right weighted values 9 5 5 8 6 (-.belowMedian) # wts*vals 15 10 20 16 32
An interesting twist to this way of generating a boolean to select the below-median values is that we "unsort" the simple boolean generated on the sorted set - medpt>i.#gv - by indexing by the grade of the grade vector - /: gv . This assumes that the other items of interest are in the same order as our weights and values, perhaps different columns from the same table with these items
Follow-up to Explanation of “Box Cut”
Last month we spent two pages explaining the first line of J here.
'ontit ontap'=: split <;._1&>TAB,&.><;._2 ] 0 : 0 Beer Name Served In ABV Price Allagash White 16oz. Draft 5.5 $7.00 Bear Republic Roggenbier 14oz. Draft 4.5 $7.00 …
This used the “cut” conjunction “ ;. “ with the box verb “ < “ and two of cut’s arbitrary numeric qualifiers, “ _1 “ and “ 2 “, to format tab-delimited lines of text into a useful matrix. This combination of the “box” verb with the “cut” conjunction is what I call “box cut”.
Here's a follow-up to that explanation, showing one way this form is useful. In this exercise, we apply the same expression we saw last month to create two tables from tab- and LF-delimited text, then use the resulting variables to track down differences between two moderately large sets of data: lists of the members of the S&P 400 index.
load 'c:/amisc/Clarifi/THB/sp400s.ijs' NB. S&P400 members on different dates. $&.>on501;<on518 NB. Check the size of each: should be the same. +-----+-----+ |400 7|400 7| +-----+-----+ on518 -: on501 NB. Is data the same? No. 0 'on518 on501'=. /:~&.>on518;<on501 NB. Sort them both to be sure... on518 -: on501 NB. Still different, but how? 0 $on518 -. on501 NB. How many items on new date not on old one? 1 7 on518 -. on501 NB. What is different one on 5/18 not in 5/1? +---+---------------------+--------+---------+------+--+------------------+ |TPX|TEMPUR PEDIC INTL INC|15686101|88023U101|156861|01|Household Durables| +---+---------------------+--------+---------+------+--+------------------+ on518 -.~ on501 NB. What is in 5/1 but not 5/18? +---+-------------------+--------+---------+------+--+--------------------+ |TNB|THOMAS & BETTS CORP|01054001|884315102|010540|01|Electrical Equipment| +---+-------------------+--------+---------+------+--+--------------------+
Details of the Script
The script "sp400.ijs" has two entries. The first begins like this:
NB.* sp400s.ijs: constituents of S&P400 Midcap index on different dates in 2012. 'tit on518'=: split <;._1&>TAB,&.><;._2 ] 0 : 0 Ticker Company Name Issue ID CUSIP gvkey iid Industry AAN AARON'S INC 00107601 002535300 001076 01 Specialty Retail ALK ALASKA AIR GROUP INC 00123001 011659109 001230 01 Airlines ALEX ALEXANDER & BALDWIN INC 00125401 014482103 001254 01 Marine Y ALLEGHANY CORP 00127401 017175100 001274 01 Insurance SWKS SKYWORKS SOLUTIONS INC 00132701 83088M102 001327 01 Semicond... ...
The other global is assigned very similarly but with a different name, starting like this:
'tit on501'=: split <;._1&>TAB,&.><;._2 ] 0 : 0 ...
The value of the title vector "tit" is the same in both cases which is why we re-used the name.
Now when we want to compare another date’s index composition, we have the boilerplate into which we can insert its data.
So, if we’re on a phone call with a client who claims the index did not change between 4/30 and 5/1, we can verify this while we’re talking to him by adding the data for 4/30 into our script and doing the following.
load 'c:/amisc/Clarifi/THB/sp400s.ijs' 'on518 on501 on430'=. /:~&.>on518;on501;<on430 -: / /:~ &.> on430 ; <on501 NB. Sorted tables the same? No. 0 $on501 -. on430 1 7 on501 -. on430 NB. What are the differences? +---+-------------+--------+---------+------+--+------------------------+ |SVU|SUPERVALU INC|01019001|868536103|010190|01|Food & Staples Retailing| +---+-------------+--------+---------+------+--+------------------------+ on501 -.~ on430 +--+-------------------------+--------+---------+------+--+------------------+ |AM|AMERICAN GREETINGS -CL A|00146801|026375105|001468|01|Household Durables| +--+-------------------------+--------+---------+------+--+------------------+
Now we can be sure that the index did change and we can specify what the changes were. We see that "American Greetings" was in the index on 4/30 but was replaced by "Supervalu" on 5/1.
Show-and-Tell
Advanced Topics
Learning, teaching, and problem-solving
Materials
- File:A Conversation About Shape.pdf
- File:Bilingual brain boost.pdf
- File:CanonicalProjectionIJS.pdf
- File:Elegant SQL - weighted median.pdf
- File:Example Prelim Intros to Programming Languages.pdf
- File:First Thoughts on R.pdf
- File:Hello World demo for JHS.pdf
- File:Javascript Performance Compared.pdf
- File:JournalOfJ April2012.pdf
- File:Programming language board game.pdf
- File:Teaching code.pdf
- File:The State of Games in the Classroom.pdf
- File:Using box cut data.pdf
- File:Who-dunnit-Crosslinguistic Differences in Memory.pdf
- File:Why does one language succeed and another one fail.pdf
. -- -- Devon McCormick <<DateTime(2012-06-13T12:59:21-0200)>>