Essays/Attribute-Value Processing
KenLettow posed the following problem to the J Programming Forum on 2008-11-11:
Each line of a file contains strings of the form attribute=value separated by the & character (see below). Generate a table of the values for each line of the file for a specified list of attributes.
y=: 0 : 0 att0=4010&att7=2457&att2=439 att3=902&att2=413&att5=4262&att4=4967 att5=4040&att1=465 att4=2733 att3=2397&att2=1104&att6=2625 )
A Solution
The problem can be solved by using cut (;.) as follows. The solution works on all the lines at once rather than a line at a time.
NB. y: lines of att=value&att=value&att=value& ... terminated by LF NB. x: required attributes tab=: 4 : 0 av=. a: -.~ (y e. LF,'=&') <;._2 y NB. attribute-value pairs a=. av #~ (#av)$1 0 NB. attributes v=. av #~ (#av)$0 1 NB. values n=. (y=LF) +/;._2 y='=' NB. # attributes in each line }:"1 v (<"1 (I.n),.x i. a)} ((#n),1+#x)$a: )
For example:
] x=: <;._1 ' att0 att1 att2' ┌────┬────┬────┐ │att0│att1│att2│ └────┴────┴────┘ x tab y ┌────┬───┬────┐ │4010│ │439 │ ├────┼───┼────┤ │ │ │413 │ ├────┼───┼────┤ │ │465│ │ ├────┼───┼────┤ │ │ │ │ ├────┼───┼────┤ │ │ │ │ ├────┼───┼────┤ │ │ │1104│ └────┴───┴────┘
Program Logic
0. If y is cut on trailing LF , = , and & characters, and empty boxes are removed, the result is a boxed vector of attribute value attribute value ...
] av=. a: -.~ (y e. LF,'=&') <;._2 y ┌────┬────┬────┬────┬────┬───┬────┬───┬────┬───┬────┬────┬────┬────┬────┬────┬────┬───┬────┬────┬ │att0│4010│att7│2457│att2│439│att3│902│att2│413│att5│4262│att4│4967│att5│4040│att1│465│att4│2733│... └────┴────┴────┴────┴────┴───┴────┴───┴────┴───┴────┴────┴────┴────┴────┴────┴────┴───┴────┴────┴
1. The even-numbered entries are the attribute names.
] a=. av #~ (#av)$1 0 ┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐ │att0│att7│att2│att3│att2│att5│att4│att5│att1│att4│att3│att2│att6│ └────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘
2. The odd-numbered entries are the corresponding values.
] v=. av #~ (#av)$0 1 ┌────┬────┬───┬───┬───┬────┬────┬────┬───┬────┬────┬────┬────┐ │4010│2457│439│902│413│4262│4967│4040│465│2733│2397│1104│2625│ └────┴────┴───┴───┴───┴────┴────┴────┴───┴────┴────┴────┴────┘
3. The number of a-v pairs on each line obtains by a partitioned sum on the number of = on the line.
] n=. (y=LF) +/;._2 y='=' 3 4 2 0 1 3
4. The overall result has shape (# lines),(# attributes of interest). The value part of each a-v pair amends entry i,j where i is the line number and j is x i. a . The program temporarily works with a table with one extra column, with the values for attributes not in x amending that extra column.
(#n),1+#x 6 4 ] i=. (I.n) ,. x i. a 0 0 0 3 0 2 1 3 1 2 1 3 1 3 2 3 2 1 4 3 5 3 5 2 5 3 v (<"1 i)} 6 4$a: ┌────┬───┬────┬────┐ │4010│ │439 │2457│ ├────┼───┼────┼────┤ │ │ │413 │4967│ ├────┼───┼────┼────┤ │ │465│ │4040│ ├────┼───┼────┼────┤ │ │ │ │ │ ├────┼───┼────┼────┤ │ │ │ │2733│ ├────┼───┼────┼────┤ │ │ │1104│2625│ └────┴───┴────┴────┘
Line-at-a-Time
line=: 4 : 0 av=. a: -.~ (y e. LF,'=&') <;._2 y a=. av #~ (#av)$1 0 v=. av #~ (#av)$0 1 (a i. x) { v,a: ) tab2=: 4 : 'x&line;.2 y' x (tab -: tab2) y 1
Notes
- If lines of y are terminated by CRLF rather than by LF , the CR characters must first be removed: x tab y-.CR
- If lines of y are separated by LF rather than terminated by LF , append a LF to y : x tab y,LF
- The program does not handle cases where a value contains = or & (or even LF), nor a value enclosed in quotes.
Contributed by Roger Hui.