Community/Conference2014/Talks/UsingDataAsCode
The following is a small bit of code, performed in (I think) a rather elegant way. The task is to read and write a Microsoft wave file, and as programmers we naturally want to implement the task in a way that reduces code duplication and is easily maintained. To accomplish this we factor out the structure of a wave file entirely, placing it in a structured string (thus replacing code with data). The code is now imminently maintainable and modifiable for tasks within a certain scope, including parsing other RIFF file types such as AVI. To treat such a file we do not have to modify the parsing code at all; we merely have to change the string that gives the header definitions and the sections of code which manipulate the actual file data. One can also get a clear idea of the structure of a wave file and our particular assumptions about the file by glancing over this data structure without reading any code at all. For instance, we require AudioFormat to be 1, meaning that the data in the wave file is uncompressed.
NB. Header definitions for a Microsoft wave file, as described here: NB. https://ccrma.stanford.edu/courses/422/projects/WaveFormat/ NB. The four columns are NB. Field length in bytes NB. Field type (i for integer or c for character) NB. Field name NB. Field definition (preceded by | for easier parsing) NB. The specific content of these headers is mostly unimportant-- NB. we are interested only in those fields for which the definition NB. is left blank. WAVE_HEADER =: ((;:@{. , >:@[<@}.])~ i.&'|');._2 ] 0 : 0 4 c ChunkID |'RIFF' 4 i ChunkSize |20 + Subchunk1Size + Subchunk2Size 4 c Format |'WAVE' 4 c Subchunk1ID |'fmt ' 4 i Subchunk1Size |16 2 i AudioFormat |1 2 i NumChannels | 4 i SampleRate |44100 4 i ByteRate |SampleRate * NumChannels * BitsPerSample%8 2 i BlockAlign |NumChannels * BitsPerSample%8 2 i BitsPerSample | 4 c Subchunk2ID |'data' 4 i Subchunk2Size | ) NB. We assign each of the columns to names, and reduce the LEN and NB. TYP columns from strings to numbers and bytes, respectively. 'LEN TYP NAME DEF' =: <"_1|: WAVE_HEADER LEN =: ".@> LEN TYP =: ; TYP NB. Some of the given definitions refer to elements that come later NB. in the header. This will cause problems when writing a wave file, NB. so we define a permutation ORDER which orders the columns to NB. remove dependencies. NB. This method first generates a dependency matrix by testing for NB. each NAME whether it is included as a string in each DEF. tsort NB. is applied to the resulting matrix, and repeatedly identifies NB. the first element with no dependencies to add its index to the NB. final output. tsort =. (] , 1 i.~ ] (0"0@[)`[`]} (*./@:e.&> <))^:(>&#)^:_ & ($0) ORDER =: tsort (+./@:E.)&>~&NAME(I.@:)(<@)"0 DEF NB. For convenience, we now fill in the blank definitions with their NB. own names (e.g. NumChannels is defined to be NumChannels). DEF =: =&a:`(,:&NAME)} DEF NB. toint will be used to convert the numeric fields from lists of bytes NB. to (unsigned) integers. Note that the bytes are reversed as the NB. integer headers are little-endian. toint =: 256 #. a.i.|. NB. y is the path to a wave file. NB. readwav returns the PCM data (which is just a list of numbers) NB. from that file. The output has shape (n,l) for n-channel sound. readwav =: 3 : 0 NB. Read the file. y =. 1!:1 boxopen y NB. Separate the headers from the PCM data after the headers. 'hdr y' =. (+/LEN) ({. ; }.) y NB. First use (</.~ I.) to group the headers by their length. NB. Then, for each header where TYP is 'i', convert that header's NB. data from a list of bytes to integers. NB. Finally, perform a multiple assignment to NAME. (NAME) =. hdr =. ('i'=TYP) toint&.>@]^:["0 hdr (</.~ I.) LEN NB. The cool part! NB. Once we assign the names to their values, each of the definitions NB. also evaluates to those values if and only if the input file meets NB. our specifications. assert. hdr -: ".&.> DEF NB. Since toint outputs an unsigned integer, we will use s to convert NB. back to a signed integer. M =. 2 <.@^ <:BitsPerSample s =. (+:M)&|&.(M&+) NB. Now we make use of the fields we are interested in (that is, those NB. that were left blank above). NB. We take the first Subchunk2Size elements of the PCM data given by y NB. (in theory, this is not necessary, but in practice, it sometimes NB. matters). NB. Then we convert the data to integers in groups given by the number NB. of bytes per sample. If BitsPerSample is 16, this could be NB. optimized to the faster _1&(3!:4) , although it is not here. NB. Finally, we turn this into an array with a row for each channel. |: (-NumChannels) ]\ s (-BitsPerSample%8) toint\ Subchunk2Size {. y ) NB. x is PCM data as output by readwav, and y is the file to write to. NB. Perform the write. NB. Note that writewav parallels the structure of readwav, but backwards. NB. This is more easily seen in some places (1!:1 versus 1!:2 ; the NB. calls to toint) than others. writewav =: 4 : 0 NB. Define our fields of interest using x. We assume our input data NB. fits in 16-bit signed ints, or equivalently ranges from (-2^15) to NB. (<:2^15) inclusive. This allows us to take a shorcut and convert NB. to character data using 1&(3!:4) rather than the slower toint^:_1 . NB. After these definition, x is the PCM data as a string of bytes. NumChannels =. #x Subchunk2Size =. #x =. 1 (3!:4) ,|: x BitsPerSample =. 16 NB. The cool part, backwards! In the order given by our topological NB. sort, we assigned each name to its evaluated definition. for_i. ORDER do. (i{::NAME) =. ". i{::DEF end. NB. We take the list of names and execute them. Then we convert those NB. that are integers to lists of bytes (identical to the above but NB. for one application of ^:_1 !) and ensure that each is the NB. appropriate length by padding with zero bytes ({.a.) . hdr =. ; LEN {.!.({.a.)&.> ('i'=TYP) toint^:_1&.>@]^:["0 ".&.> NAME NB. We place the PDM data after the headers and write to the file. (hdr, x) 1!:2 boxopen y )