Doc/Articles/Play134
At Play With J ... Table of Contents ... Previous Chapter ... Next Chapter
14. Stumping the Rocket Scientist
. By Eugene McDonnell. First published in Vector, 13, 4, (April 1997), 123-129.
A. The Abstract Problem
This column concerns a statistical application, having to do with a rating problem involving 5 integer variables, related as follows:
a >: 0 c <: a d < 100 * c t <: c i <: a - c
My interest in this application arose because the rating process is usually stated in quite a pedestrian way, yet has the reputation of being arcane and involved in the extreme. I'll give the pedestrian statement first, then an analysis of the statistical boundaries of the problem, next a J program following the statement as closely as possible, and lastly a J verb which is more concise and more efficient. In Section B I'll describe the physical situation giving rise to the statistical application.
To obtain the rating of a given system of these five variables proceed as follows:
Step 1: c divided by a. Subtract 0.3, then divide by 0.2.
Step 2: d divided by a. Subtract 3, then divide by 4.
Step 3: t divided by a, then divide by 0.5.
Step 4: Start with 0.095, and subtract i divided by a. Divide the product by 0.04.
The sum of each step cannot be greater than 2.375 or less than zero. Add the sum of steps 1 through 4, multiply by 100 and divide by 6. This is the rating.
We form the argument to the program as a five-item list:
a, c, d, t, i
I'll write the program in J. The first line shows the change brought by Release 3.03, January 1997 in the way of doing indirect assignment; one letter names are now treated in the same way as multiple letter names, that is, with a space separating names.
Rating =: verb define 'a c d t i' =. y step1 =. ((c % a) - 0.3) % 0.2 step2 =. ((d % a) - 3) % 4 step3 =. (t % a) % 0.05 step4 =. (0.095 - i % a) % 0.04 (100*+/2.375<.0>.step1,step2,step3,step4)%6 )
The number of tokens in this program is easily found:
#;:5!:5<'Rating' 79
The time required by Rating is 0.024.
The four steps are roughly, but not exactly, the same. My impulse is to see whether I can make them exactly similar, for if we can we can take advantage of the array processing abilities of J.
I take Step1 as the pattern. It has the form:
((v % a) - w) % z
Step2 follows the pattern exactly. Step 3 lacks the - w part, but that is easily fixed using the identity:
x - 0 x
Using this, we'll rewrite Step3 as:
step3 =. ((t % a) - 0) % 0.05
Step4 is only slightly more complicated. It reverses the minuend and subtrahend.
step4 =. (0.095 - i % a) % 0.04
We can switch the two around by using the identity:
(s - t) % u (t - s) % - u
To give us:
step4 =. ((i % a) - 0.095) % _0.04
What I had in mind by putting them in the same form was to be able to take advantage of J's array processing abilities to get rid of the four local variables by writing something like:
x =. (c, d, t, i) % a
or,
x =. (}. % {.) y NB. behead divided by head
If we now form two lists, one of minuends and another of divisors, we can then replace the four Step statements by:
m =. 0.3 3 0 0.095 n =. 0.2 4 0.05 _0.04 (x - m) % n
Next, reciprocate n and replace division by multiplication:
] b =. % n 5 0.25 20 _25 b * (x - m)
If now we distribute the multiplication within the parentheses we get:
(b * x) - (b * m)
And, since the right limb is the product of constants, we can replace it by its product:
] q =. b * m 1.5 0.75 0 _2.375 (b * x) - q
I'm trying to arrive at an expression involving a linear polynomial, and am almost there. I have in mind using J's polynomial primitive (p.). For that I'll have to form a as the negate of q and reverse the order of the terms:
a =. - q _1.5 _0.75 0 2.375 a + (b * x)
Whew! We've got our linear polynomial (actually, four of them). This has been tedious, although eventually interesting. We now can replace all of the steps of Rating by:
(100 * +/ 2.375 <. 0 >. a + (b * x)) % 6
or, using the polynomial primitive,
(100 * +/ 2.375 <. 0 >. (a , b) p. x) % 6
Looking at this, we get irritated by that 100 * and that % 6 . We can use two identities:
u * +/ v +/ u * v (+/ v) % w +/ v % w
And arrive, after a bit of algebra, at:
] e =: 100r6 * a _25 _12.5 0 39.5833 ] f =: 100r6 * b 83.3333 4.16667 333.333 _416.667 ] g =: e ,. f _25 83.3333 _12.5 4.16667 0 333.333 39.5833 _416.667 ] h =: 100r6 * 2.375 NB. 39.5833 is 475r12 39.5833
Table g lists in its leading column the constant coefficients, and in the last column the linear coefficients for each of the four linear polynomials.
Rtg=: [: +/ 0 >. h <. g p. }. % {.
In this verb, the trailing four items are divided by the leading item, and used as the right argument to the polynomial primitive, with the left argument table g . The four evaluations are constrained to lie in the interval from 0 to 475r12, inclusive, and the constrained values are summed to give the rating.
The verb Rtg has 16 tokens and takes 0.007 units of time: about one-fifth of the size, and less than one-third the time of the program Rating .
Having the four linear polynomial coefficients allows us to determine the meaningful boundaries of all systems.
Table A event min max c % a 0.3 0.775 d % a 3 12.5 t % a 0 0.11875 i % a 0.095 0
Here's how to read this table: If, for example, the result of c%a is 0.3 or less, the rating will be 0 for the c%a event. If it is 0.775 or greater, the rating will be 475r12 . Similarly for the next two rows. For the last row, a result for i%a of 0.095 or greater will give a rating of 0 for that event. A result of 0 (it can't be less) will give a rating of 475r12 for that event. Here are some numerical examples: The maximum rating can be obtained by the system of values:
mxr =. 800 620 10000 95 0 Rtg mxr 158.333
Recall that the ratings depend on the ratio of the trailing values to the leading value. When the leading value is 800, the list mxr produces the maximum rating of 158.333, since
620 10000 95 0 % 800 0.775 12.5 0.11875 0
give the values in the column headed max in Table A.
Changing the system to give the maximum values possible given the constraints listed at the beginning of this section does not give a greater result:
Rtg 800 800 80000 800 0 158.333
Conversely, the minimum rating (zero) is obtained with the system:
Rtg 800 240 2400 0 76 NB. result really 0 3.55271e_15
And similarly, we can say that changing the system to:
Rtg 800 0 0 0 800 0
will produce the same zero rating.
B. The Physical Problem
Now I have to apologize to readers outside of the United States of America for imposing on your good nature for so long, when what I was describing derives from the parochial form of football popular in the the USA but (I believe) not well-known outside that country. In that game there is a preeminent hero called the quarterback. He stands behind a line of seven myrmidons, the central one of which (called the center), hands the ball between his legs to the quarterback while in a crouching stance and facing away from the quarterback. The quarterback can hand the ball in turn to one of the people behind the line like himself, or can run with the ball, or he can throw it forward, aiming it in the direction of one of his running teammates. This is called a forward pass, and it is his ability to deliver forward passes so that they are caught by a teammate before hitting the ground that is measured by the rating system described so laboriously above. The five variables so artfully abbreviated above are now made plain to you:
- a is the number of forward passes attempted.
- c is the number of passes caught by an eligible teammate.
- d is the distance traversed from the line to the point of completion of the play, for all pass plays.
- t is the number of completed passes which result in a goal, or touchdown.
- i is the number of attempted passes which are ingloriously caught by a member of the opposing team - an interception.
As a sample piece of data I'll use the lifetime data of the quarterback George Blanda, who played professional football in the USA for a number of teams from 1949 through 1975. Before showing you this data, I'll interject some personal history. George Blanda and I were in the graduating class of 1949 at the University of Kentucky. George had been the successful quarterback of the college football team. He became a professional player immediately, and played for many years. When my job moved my family and me to Palo Alto, California, in the fall of 1974, I became aware that my old classmate George was still playing football for a living, and not only that, but he was a stellar performer. Week after week it was he who saved the day for his team, the Oakland Raiders. Oakland is a large city across the bay from San Francisco, and about thirty-five miles north of Palo Alto. I was 48, but felt a resurgence of youth in seeing what my coeval Blanda was still doing on the football field. He played through the seasons of 1974 and 1975 before finally retiring (actually he was forced out by his management, who wanted to bring in younger players). George holds the record for the total number of points scored by a football player, 2,002. The nearest player to him has scored 1,699 points.
Let us see then what George Blanda's lifetime statistics are:
attempts: 4007 completions: 1911 yards: 26920 touchdowns: 236 interceptions: 277
Applying our Rtg program gives us his career rating:
Rtg gb =. 4007 1911 26920 236 277 60.6475
Blanda doesn't have a particularly good rating largely because of the great number of interceptions he threw. Quarterbacks with high ratings usually have many more touchdown passes than interceptions. The quarterback Joe Montana, for example, while playing for the San Francisco football team compiled a record of:
jm =. 4600 2929 35124 244 123 Rtg jm 93.5
This was the highest career rating for any quarterback to have played the professional game. Ratings are also compiled during the football season, as well as for entire careers. Has anyone ever achieved the maximum rating? No one has ever done it for a career, or even for a season, but for a single game it has been done. The player John Taylor of the San Francisco team was called on in one game to throw the ball (he had never done this in a game before). It went for twenty yards, was completed, and scored a touchdown. So Taylor's rating for that game was:
Rtg 1 1 20 1 0 158.333
I got the title for this column from the fact that American sportswriters and broadcasters are confident that the formula is so arcane it baffles even rocket scientists. We know better, of course. It really only baffles sportswriters and broadcasters.