User:Matthew Brand/RAMDisk
A problem I ran into when a program inputs and outputs tens of thousands of files is that the system spent most of it's time doing Unix pdflush and kjournal and not much time actually doing the calculations! I do not think that is a J issue, it is an OS issue. Apparently I could have tuned the OS - but who in their right mind wants to do that - or ask a client to do it on each new machine!! I wrote this RAMDisk utility to compress and cache file reads and writes. It compresses the data before writing and uncompresses on reading.
It works for me. When my program runs with the RAMDisk, the CPU is at 100% and takes a relatively short time to complete. Without it, the CPU is at around 10%, the disk is crunching itself into oblivion, and the program does not complete in a reasonable amount of time.
Because the RAMDisk writes many files in a block at the end of the program (or in chuncks during depending on maxsize_RAMDisk_), without reads inbetween, the flushing of files is much quicker compared to writing the files during program execution.
To compare the two methods you can set:
comp_RAMDisk_ =: comp_utils_ ucomp_RAMDisk_ =: ucomp_utils_
which will by-pass the caching bit and io directly to the disk.
There is a one off thing-to-do, you need to run this code in a fresh J session. It creates a file which contains an empty symbol table:
load 'arc/zip/zbuffer' typecheck =: =3!:0 isboxed =: 32&typecheck boxifopen =: <^:(-.@isboxed) compressData =: 3!:1@(#;1&zput)@(3!:1) comp =: compressData : ( comp@] (1!:2) (boxifopen@[) ) NB. compress to disk (jpath,'~user/classes/emptySymbolTable.cmp') comp 0 s: 10
The RAMDisk program:
NB. Some utilities cocurrent 'utils' load 'arc/zip/zbuffer' typecheck =: =3!:0 isboxed =: 32&typecheck boxifopen =: <^:(-.@isboxed_utils_) NB. compress anything. Use level 1 for speed (level 9 is similar compression ratio.) compressData =: 3!:1@(#;1&zput)@(3!:1) unCompressData =: 3!:2@(0&{:: zget (1&{::))@(3!:2)`(ucomp@])@.(32&=@(3!:0)) comp =: compressData : ( comp@] (1!:2) (boxifopen@[) ) NB. compress to disk ucomp =: (unCompressData@(1!:1)@boxifopen) : [: NB. decompress from disk exists =: 0&<@#@(1!:0)@boxifopen NB. does file or directory y exist newdir =: 1!:5@boxifopen NB. create new directory y createPath =: newdir_utils_ ^:(-.@:exists_utils_@:(_1&}.)) NB. create a path if it does not exist NB. Create entire tree if required without bitching. :: 0: required because NB. it does what it should then outputs an error ... :: 0: ignores that. createPathTree =: ( (createPath_utils_ f.)@ ,&'/' @ ; @ ]\ @:(_1&}.) @: (<;.1) ) :: 0: NB. The RAMDisk cocurrent 'RAMDisk' NB. <User parameters> SOLESYMUSER =: 0 maxsize =: 2^27 NB. <\User parameters> instructions =: 0 : 0 You need to create an "emptySymbolTable" file. Start a fresh J session and execute these lines: load 'arc/zip/zbuffer' typecheck =: =3!:0 isboxed =: 32&typecheck boxifopen =: <^:(-.@isboxed) compressData =: 3!:1@(#;1&zput)@(3!:1) comp =: compressData : ( comp@] (1!:2) (boxifopen@[) ) NB. compress to disk (jpath,'~user/classes/emptySymbolTable.cmp') comp 0 s: 10 ) ace =: a: "_ fromsym =: 5&s: init =: 3 : 0 data =: '' [ y keys =: '' keylu =: keys&i. resetSymbolTable '' size =: 0 ) resetSymbolTable =: 3 : 0 if. SOLESYMUSER do. try. 10 s: ucomp_utils_ jpath,'~user/classes/emptySymbolTable.cmp' catch. smoutput instructions end. end. ) ucomp =: 3 : 0 try. data =. (unCompressData_utils_@:>@:fromsym@:({&data)@:keylu@:s:)@:boxifopen_utils_ y catch. try. NB. if it is not on disk then throw data =. ucomp_utils_@:boxifopen_utils_ y catch. throw. end. end. ) comp =: 4 : 0 ii =. keylu key =. s: boxifopen_utils_ x 'dsym dlen' =. ((s:@:<);#) compressData_utils_ y if. ii = # keys do. data =: data, dsym keys =: keys , key keylu =: keys&i. else. data =: dsym ii} data end. size =: size + dlen flush '' size ) flush =: 3 : 0 if. maxsize <: size do. ks =. fromsym keys for_i. ks do. createPathTree_utils_ > fpath =. i fpath 1!:2~ > fromsym i_index { data end. init '' end. ) report =: 3 : 0 ( <"0 keys) ,. $&.> (5&s:) data [ y ) cocurrent 'base'
A Simple example:
NB. Simple EXAMPLE: NB. set parameters SOLESYMUSER_RAMDisk_ =: 1 NB. allow program to clear the symbol table maxsize_RAMDisk_ =: 10000 NB. will flush to disk when size reaches 10000 bytes ( use larger value in practice). init_RAMDisk_ '' NB. clear the RAMDisk '/tmp/fileA.cmp' comp_RAMDisk_ i.10 NB. cache i.10 in file '/tmp/fileA.cmp', output is size of the buffer report_RAMDisk_ '' '/tmp/fileB.cmp' comp_RAMDisk_ 5 4 3 NB. store 5 4 3 in '/tmp/fileB.cmp' '/tmp/fileC.cmp' comp_RAMDisk_ 'some text' NB. more data... '/tmp/e1/e2/e4/fileB.cmp' comp_RAMDisk_ <"0 i. 10 NB. more data... report_RAMDisk_ '' ucomp_RAMDisk_ '/tmp/fileC.cmp' NB. retrieve data (either from cache, or disk ... if not found then a:) '/tmp/e1/e2/fileA.cmp' comp_RAMDisk_ 'some of this is text';0;0;1 NB. more data '/tmp/e1/e2/fileA.cmp' comp_RAMDisk_ i.10000 NB. enough data to trigger a flush to disk report_RAMDisk_ '' ucomp_RAMDisk_ '/tmp/e1/e2/fileA.cmp' NB. retrieve some data maxsize_RAMDisk_ =: 0 NB. force any remaining data to the disks flush_RAMDisk_ '' init_RAMDisk_ '' NB. clear the symbol table (and RAMDisk)
<coming soon> An example with thousands of io and comparison.