User:Daniel Gregoire/CSV by Hand
I had a large CSV to load into J.
I reached for tables/csv
and its readcsv
verb. This worked, but it took a long time to run. I assume it's all the boxing.
So I wrote these sentences to do things "by hand". I generated the source CSVs on a different system, so I had confidence they were both simple and uniform in structure (almost entirely numbers, no quotes, only comma field separators and newline row separators).
J Code
NB. === Verbs
timeIt=.(6!:2) NB. Run J code and return elapsed time
parseHeader=.<;._1@:(','&,)@:(#~ ' '&~:) NB. Cut header row into boxes
parseHeader=.','&splitstring@:(#~ ' '&~:) NB. Equivalent, using a base verb
simpleCsv=.{.@:(".;._1)&(','&,) NB. Cut data row and evaluate
NB. === Nouns
f=.'/tmp/big.csv'
raw=.'m'freads f NB. 'm' avoids boxing of contents
]parsedTime=.timeIt'd=. simpleCsv }.raw' NB. Skip header row, parse CSV; capture elapsed time
$d NB. check shape of data table
]headers=. parseHeader {.raw NB. grab header row _from the raw_
The definition of parseHeader
deserves a little attention.
The left argument of 'm'
to freads
reads the file into a matrix, which is square. This results in fill values being added to rows that aren't long enough.
The parseHeader
verb expects a character vector, so the hook (#~ ' '&~:)
safely removes the trailing space characters that were used as fill.
Note to myself and to other junior J programmers: hook and #
go really well together. I find this pattern recurring frequently.
Moving on.
After I have my parsed CSV, I then create simple accessor functions for columns I want to analyze, because we can then write short, elegant verbs:
]started=. (headers i. <'started')&{"1 NB. fn: get started column
]completed=. (headers i. <'completed')&{"1 NB. fn: get completed column
cycleTime=. completed-started NB. fn: calculate cycle time
Stick a fork in it! (Sorry, I am a dad.)