NYCJUG/2010-02-09/parallelProbSets.ijs
< NYCJUG | 2010-02-09
Jump to navigation
Jump to search
The following code supports the effort outlined here to create arbitrarily large data sets for realistic testing of parallel processing.
NB.* parallelProbSets.ijs: generate large random datasets for testing parallel programs. load 'files dates' NB. Handle TSV (Tab-separated values) files; 'TAB LF CR'=. 9 10 13{a. NB.* readTSVFl: read tab-delimited file into variable. readTSVFl=: ([:<;._1&> TAB ,&.> [:<;._2 [:(],LF#~LF~:_1{]) CR-.~fread) NB.* getTSVInfo: apply arbitrary function to each .tsv var named. getTSVInfo=: 1 : 'u readTSVFl y' NB.EG lnkey=: (0&{"1) getTSVInfo&.>rrmlnms NB.* getFlsInfo: apply arbitrary function y to each var read from file by v. getFlsInfo=: 2 : 0 if. nameExists 'SHOWGFI' do. if. SHOWGFI do. smoutput y,': ',":qts'' end. end. u v y NB.EG lnkey=: ((0&{"1) getFlsInfo readTSVFl)&.>rrmlnms ) appendTSVFl=: 4 : '(x,~readTSVFl y) writeTSVFl y' writeTSVFl=: 4 : '(enc2TSV x) fwrite y' enc2TSV=: 13 : ';(LF,~[:}:[:; TAB,&.>~])&.><"1 y' NB. Case 0: present-value cashflows along different interest-rate paths. genCFs=: 13 : '|:/:~"1]1000+100%~<.900000*(360,y)?@$0' NB.EG cf0=. genCFs 1e4 NB. 10,000 30-year cashflows elimNeg=: 3 : '(100%~>:?0)+y-<./y'"1 maxRng=: 3 : 'y*(0.10+10%~?0)%>./y'"1 genIRs=: 3 : 0 irp=. ([:+/\1000%~[: <:[:+:0?@$~360,~]) y NB. Rates change randomly irp=. maxRng elimNeg irp NB. Rates>0%, <:20% irp=. irp/:*/"1 >:irp NB. Order for neatness NB.EG ir0=. genIRs 1e4 NB. 10,000 30-year paths ) wrCFIRFls=: 4 : 0 (":&.>genCFs x) writeTSVFl '.tsv',~'CF0_',":y (":&.>genIRs x) writeTSVFl '.tsv',~'IR0_',":y >:y ) NB.EG 1e4 wrCFIRFls^:10]0 NB. Write 10 file sets w/10,000 records each NB. Case 1: sort many records by date, movie, or user. genDMURRecs=: 3 : '(100#.todate 70476+?y$6264),.(y,3)?@$20000 1e6 10' NB.EG dmur0=. genDMURRecs 1e6 wrDMURFl=: 4 : '>:y[(":&.>genDMURRecs x) writeTSVFl ''.tsv'',~''DMUR0_'',":y' NB.EG 1e6 wrDMURFl^:10]0 NB. Make 10 sets of 1 million records each