Scripts/OLEExcel
Abstract
Overview of a script that uses Microsoft OLE to read and write Excel spreadsheets as well as run macros. Also gives examples of adding a sheet to a workbook. These routines may rely on "xlutils.ijs" but this has been dropped from the more recent versions (>6?) of J, so can be found here.
Brief Introduction
This code for reading an Excel spreadsheet, e.g.
ssdata=. extractXLData_excel_ 'C:\amisc\spsheet.xls'
returns the contents of all worksheets as an N x 3 matrix where "N" is the number of worksheets. The first column contains names of worksheets; the second column gives the range of the data in Excel notation; the third column contains the data.
Code to write to an Excel spreadsheet, e.g.
writeLines2WB_excel_ ('11/15/2006';'IBM';93.11);18 1;'C:\amisc\spsheet.xls';1
writes the three items, starting at the 18th row (origin 1) and 1st column of the spreadsheet named, in worksheet number one. As the name suggests, this also allows us to write multiple lines at a time - you might even think of it as writing a matrix if you're inclined to that sort of perversion. So, you might write these three tables to the specified spreadsheet this way:
mat1=. i. 3 4 mat2=. 'Now is the time','for all good people','to come to the aid',:'of their world.' mat3=. (":i. 2 3);<3 2$'ehtoat' ssht=. 'C:\amisc\J\Virgin.xls' writeLines2WB_excel_ mat1;1 1;ssht;1 writeLines2WB_excel_ mat2;4 5;ssht;1 writeLines2WB_excel_ mat3;4 2;ssht;'Sheet1'
This second write gives a message like this:
- not sure why - but clicking "OK" allows it to proceed. Note the first two writes use the index of the sheet but the third uses the name. Once you've done the three inserts shown above, here's
what you might see in the spreadsheet on the first tab.
It's likely that we might write a table that mixes character strings and numbers, E.G.:
]pfol=: <;._1 &> TAB,&.>a:-.~<;._1 LF,0 : 0 Symbol Price Volume Shrs Price Paid Gain ALOG 61.5 51,773 -300 59.51 -$597.00 AMPH 17.25 2,200 -1 15.9 -$1.35 ARBX 6.48 42,371 1,200 5.13 $1,620.00 ^OEX 656.34 0 1 663.38 -$7.04 VIXEM.X 2.9 120 10 3.8 -$900.00 VIXEC.X 1.15 1,155 10 3.6 -$2,450.00 ) +-------+------+------+-----+----------+----------+ |Symbol |Price |Volume|Shrs |Price Paid|Gain | +-------+------+------+-----+----------+----------+ |ALOG |61.5 |51,773|-300 |59.51 |-$597.00 | +-------+------+------+-----+----------+----------+ |AMPH |17.25 |2,200 |-1 |15.9 |-$1.35 | +-------+------+------+-----+----------+----------+ |ARBX |6.48 |42,371|1,200|5.13 |$1,620.00 | +-------+------+------+-----+----------+----------+ |^OEX |656.34|0 |1 |663.38 |-$7.04 | +-------+------+------+-----+----------+----------+ |VIXEM.X|2.9 |120 |10 |3.8 |-$900.00 | +-------+------+------+-----+----------+----------+ |VIXEC.X|1.15 |1,155 |10 |3.6 |-$2,450.00| +-------+------+------+-----+----------+----------+ NB. Write portfolio, starting cell at row 1, column 2, in sheet 3 writeLine2WB_excel_ pfol;1 2;'C:\amisc\spsheet.xls';3
Note that this treats all the numbers as character strings, not as numbers. Read the following warnings (in "Caveats..." below about some potential pitfalls in dealing with numeric values.
These functions, and supporting ones, are in File:OLEExcel.ijs.
Caveats About Ugliness and Other Things
This code to read and write Excel spreadsheets using the OLE facility in J is kind of warty because Excel is warty. That is, there are a number of ugly, arbitrary things I have to do, like translate to and from Excel notation, account for how much data the clipboard can hold, ensure spreadsheet paths include a disk specification, and account for a difference in size between what was requested to be read and what was actually read from a worksheet.
There is a big potential problem with reading numbers from Excel: the OLE method returns numbers as formatted in the spreadsheet. This has a couple of unfortunate consequences: it limits precision and can miss negative numbers. This latter case could happen because Excel allows you to specify that negative numbers are indicated solely by color, e.g. all negative numbers are red. This means you will read them as a positive number using OLE - there is no way around this as far as I know.
Another, solvable problem has to do with how the representation of numbers differs between J and Excel, particularly in that J uses the underscore character "_" as a negative sign whereas Excel has a number of different conventions depending on the formatting of the cell. This means generalized code has to convert between "_" (J minus) and "-", or "(number)" for negatives.
Furthermore, Excel has some other conventions for representing numbers that have to be translated to J. For instance, Excel allows the plus sign "+" in an exponent whereas J does not. Also, miscellaneous decorators like "$" and embedded commas have to be removed from Excel numbers for J to be able to convert them to numbers.
One other caveat: these functions work only with implementations of J which support "wd". So, they will not work from J invoked by "jconsole.exe", only as "j.exe". A sign that your version of J might be something like the following:
xlopen '' |domain error: wd | wd'cc xl oleautomation:excel.application'
Pre-requisites
First, you need to run this code from within "j.exe" as "jconsole.exe" does not support "wd".
To run these routines, you should have a "jmacros.xls" file defined to hold the basic macros used by this code. If you don't, you may see a message like this:
These macros are VBA routines to perform basic I/O and such. They can be found here. In Excel 2010, you may need to invoke File/Options/Customize Ribbon and check the "Developer" box to give you easy access to the VB script editor in order to insert these macros.
Brief Explanation of Code
I'll assume the reader has already looked at the usual OLE/Excel intro.
I'll present the code with little explanation other than the comments embedded in it until I spend more time updating this page. Of course, anyone who finds this useful is also welcome to update the page with anything they've found.
One small exclusion: in order to avoid loading my own set of date/time functions, I've commented out the line assigning "XLDF" for now.
Code Introduction and Table-of-contents: global variables and functions in order
As a standard practice, I list the functions and global variables defined in my script file with a one-line description of the purpose of each at the top of the file.
NB.* OLEExcel.ijs: functions to work with MS Excel via OLE. NB.* XLDF: Excel date fudge to adjust Excel day number to Julian. NB.* CLIPMAX: Approx. max. FP nums clipboard holds NB.* getWkshtNms: get names of all worksheets in MS Excel file y.. NB.* writeLine2WB: open workbook and write row where specified. NB.* xl2num: attempt to convert Excel value to numeric value. NB.* addWS: add named worksheet to spreadsheet file. NB.* writeNewWkbook: write mat to worksheet wsnam in new workbook. NB.* createNewWkbk: create new workbook named wbname in dir wkbkdir. NB.* extractXLData: extract all data from Excel spreadsheet. NB.* fixXLFileNm: ensure we have properly qualified and formatted Excel file name. NB.* getWSName: get worksheet name given its index (item) number; assume book open. NB.* setupXLSheet2Load: get spreadsheet data, get only detail record lines, NB.* getExecutionData: select only full data records from mat of spreadsheet. NB.* get1Worksheet: get worksheet data; assume workbook already open. NB.* guessDriveOfFile: guess drive on which named file resides; return NB.* translateXLRange: translate between Excel range notation and numeric NB.* translateXLCellNotation: translate between origin-1 row-col coordinates NB.* quoteIfSp: surround name with '"'s if embedded spaces (stupid MS!). NB.* macroUsedRange: this is the text of a macro to be inserted into
Code Initialization - Set up namespace, load external modules
We establish a locale "excel" in which all this code resides. We include the base locale in order to have any of those functions available.
coclass 'excel' coinsert 'base' NB. load defaultPath 'DSLoad.ijs' load 'winlib xlutil' NB. load 'dt'
Code Body
Here is the main body of the code. The global XLDF is commented-out in this version because it requires a date function from another package but the global is not used in any of the following code, so I set it aside in order to make this script stand alone. The global would be useful for any work with dates in Excel.
The global CLIPMAX, determined empirically at one point for a particular machine and OS, is an estimate of the maximum number of floating point numbers we can hold in the clipboard at once. This is necessary because we use the clipboard to transfer data between Excel and the J session so we need to know if we have to do it in pieces for large arrays.
NB.* XLDF: Excel date fudge to adjust Excel day number to Julian. NB. XLDF=. toJulian 18991230 NB.* CLIPMAX: Approx. max. FP nums clipboard holds CLIPMAX=: 68000 NB.* runMacro: run macro (1{y) in spreadsheet 0{y. runMacro=: 3 : 'xlquit '''' [ xlcmd >1{y [ xlopen '''' [ JMACROS=. >0{y' NB.* getWkshtNms: get names of all worksheets in MS Excel file y . getWkshtNms=: 3 : 0 xlopen '' flnm=. fixXLFileNm y rc=. 0 [ nms=. 'File does not exist: ',flnm,'.' if. fexist flnm-.'"' do. NB. "fexist" doesn't want '"'s. xlcmd 'wb open ',flnm NB. Open workbook xlid 'ws' [ xlget 'temp worksheets' NB. Set and name worksheet wscount=. ".xlget 'ws count' NB. Count how many sheets nms=. getWSName&.>>:i.wscount rc=. 1 end. xlexit '' rc;nms-.&.>'"' NB.EG wsnms=. getWkshtNms 'C:\amisc\SP\LevelsOM\GermanRMBSDefaultModel.xls' ) NB.* writeLines2WB: open workbook and write rows where specified. writeLines2WB=: 3 : 0 'ln2wr whr updfl wsn'=. y NB. Lines to write, row&col, file name, xlid 'wb' [ xlget 'base workbooks' [ xlopen '' NB. worksheet name or num. xlcmd 'wb open ',fixXLFileNm updfl wbnm=. '"',(xlget 'temp name'),'"' NB. Workbook name xlid 'ws' [ xlget 'temp worksheets' NB. Set and worksheet name if. ' '={.0$wsn do. NB. Get number if name given wscount=. ".xlget 'ws count' NB. # of sheets in workbook... wsnms=. getWSName&.>>:i.wscount NB. No error checks here... wsnm=. qt,wsn,qt=.'"'#~(-.'"'e.wsn)*.'"'e.;":&.>wsnms wsn=. >:wsnms i. <wsnm NB. Quote if necessary and look up. end. xlget 'ws item ',":wsn NB. Sheet by number. wsnm=. '"',(xlget 'temp name'),'"' NB. Set where (Book, sheet, towhr=. wbnm,' ',wsnm,' ',":whr NB. row, col) to write line. ln2wr xlwriter towhr xlexit '' NB. Exit and save workbook. NB.EG writeLines2WB_excel_ ('11/15/2006';'IBM';93.11);18 2;'C:\amisc\spsheet.xls';1 ) NB.* xl2num: attempt to convert Excel value to numeric value. xl2num=: 3 : 0 if. +./'()' e. y do. y=. '-',y-.'()' end. (_&".@:(-.&'+')) y ) NB. Test the above function. coinsert_test_ 'base' coinsert_test_ 'excel' t_xl2num_test_=: 3 : 0 assert. _12.3 _123 12000-:xl2num &>'(12.3)';'-123';'12e3' assert. 0.012 12000 1 _2 10-:xl2num &>'12E-3';'12e+3';'01';'-02';'1e01' ) NB.* addWS: add named worksheet to spreadsheet file. addWS=: 3 : 0 'flnm wsnm'=. y xlopen '' xlcmd 'wb open ',flnm xlid 'ws' [ xlget 'temp worksheets' xlcmd 'ws add' xlset 'temp name ',wsnm xlexit '' NB.EG addWS 'C:\temp\test.xls';'newsheet' NB. Assumes test.xls exists. ) NB.* writeNewWkbook: write mat to worksheet wsnam in new workbook. writeNewWkbook=: 4 : 0 'fldir flnm wsname'=. y xlopen '' 'newnm wbname'=. createNewWkbk fldir;flnm xlid 'ws' [ xlget 'temp worksheets' xlcmd 'ws add' [ rc=. 1 if. 0~:#wsname do. xlset 'temp name ',wsname end. try. x xlwriter wbname,' ',wsname,' 1 1' catch. rc=. 0 end. xlexit '' ) NB.* createNewWkbk: create new workbook named wbname in dir wkbkdir. createNewWkbk=: 3 : 0 'wkbkdir wbname'=. y xlget 'base workbooks' NB. Assume sheet already open. xlid 'wb' newitem=. ":>:".xlget 'wb count' xlcmd 'wb add' xlget 'wb item ',newitem orignm=. xlget 'temp name' newnm=. wkbkdir,wbname if. fexist newnm do. ferase newnm end. NB. Overwrite any existing file. rc=. xlcmd 'temp SaveAs ',newnm newnm;wbname NB.EG 'newnm wbname'=. createNewWkbk STARTLOC;NEWTEAMSFILE ) NB.* extractXLData: extract all data from Excel spreadsheet. extractXLData=: 3 : 0 xlopen '' flnm=. quoteIfSp fixXLFileNm y if. fexist flnm-.'"' do. NB. "fexist" doesn't want '"'s. xlcmd 'wb open ',flnm NB. Open workbook xlid 'ws' [ xlget 'temp worksheets' NB. Set and name worksheet wscount=. ".xlget 'ws count' NB. Count how many sheets dat=. >get1Worksheet&.>>:i. wscount NB. Get all sheets, ranges, data rc=. wdclipwrite '' NB. Clear clipboard to avoid message. else. dat=. 0;<'File not found: ',flnm,'.' end. xlquit '' NB. Quit to avoid "Save?" message. NB. rc=. xlcmd 'wb close' NB. xlexit '' NB. wd 'reset' dat NB.EG dat=. extractXLData 'C:\EAFE\200211\nyl1112PassiveExecution.xls' ) NB.* fixXLFileNm: ensure we have properly qualified and formatted Excel file name. fixXLFileNm=: 3 : 0 flnm=. dltb y NB. Remove excess spaces. flnm=. flnm,>(-.(toupper _4{.flnm)-:suf)#suf=. '.XLS' if. -.':'e.flnm do. NB. OLE requires disk name: is one drv=. guessDriveOfFile flnm=. y NB. already specified? flnm=. drv,flnm NB. Drive:\path end. flnm=. quoteIfSp flnm NB. Quote name if there are spaces in it. NB.EG flnm=. fixXLFileNm y ) NB.* getWSName: get worksheet name given its index (item) number; assume book open. getWSName=: 3 : 0 xlget 'ws item ',":y NB. y is number of worksheet. wsname=. 1|.'""',xlget 'temp name' NB. Quote so spaces in name don't confuse ) NB.* setupXLSheet2Load: get spreadsheet data, get only detail record lines, NB. remove excess characters that will confuse data loading. setupXLSheet2Load=: 3 : 0 dat=. extractXLData y NB. Assume 1st datasheet is only one 'shps dat'=. getExecutionData >2{0{dat NB. we need to get data from. 'hdr dat'=. split dat dat=. |:dat hdr=. toupper&.>hdr-.&.>' ' numChars=. '0123456789 .-+eE' NB. Only characters found in numbers. xsChars=. <',$' NB. Excess characters to remove from whx=. >+./&.>xsChars e.&.>dat NB. numeric columns. wh=. b2i +./|:whx*.>*./&.>dat e.&.><numChars,>xsChars dat=. ((wh{dat)-.&.>xsChars) wh}dat hdr;<dat NB.EG setupXLSheet2Load >1{flnms ) NB.* getExecutionData: select only full data records from mat of spreadsheet. getExecutionData=: 3 : 0 shps=. >#&.>y fullrow=. (+/"1)0~:shps fullest=. >./fullrow dat=. (fullrow>:fullest-2)#y NB. Assume is data (or data title) row shps=. (fullrow>:fullest-2)#shps NB. if no more than 2 items short. shps;<dat ) NB.* get1Worksheet: get worksheet data; assume workbook already open. get1Worksheet=: 3 : 0 bkname=. quoteIfSp xlget 'temp name' [ xlget 'base activeworkbook' wsname=. getWSName y NB. y is number of worksheet. bw=. bkname,' ',wsname,' ' NB. initial arguments we pass to jreadr range=. xlusedrange bw NB. Which cells are populated? if. -.+./'ABCDEFGHIJKLMNOPQRSTUVWXYZ'e. toupper range do. NB. No letters means range=. ;('$',&.>'A';'IV'),&.>(range='$')<;.1 range NB. all columns. end. if. -.+./'0123456789'e. range do. NB. If no numerals in range, e.g. $A:$U, whc=. range i. ':' NB. it means all rows; however, we can't range=. (whc {. range),'1',(whc}.range),'10000' NB. read all rows (because end. NB. of limits of clipboard) so just pray someone will shoot me NB. if I ever work on a spreadsheet with more than 10000 rows. rc=. wdclipwrite '' NB. Clear clipboard to avoid message. if. ':'e. range do. NB. Is complete range? nsr=. translateXLRange range NB. Non-Stupid range notation, i.e. numeric. tsz=. */nrc=. >:|-/nsr NB. Total size of data; num rows, cols. rpi=. <.(0{nrc)*<./1,CLIPMAX%tsz NB. Rows per iter'n clipb'd can handle sro=. 0 NB. Starting row offset, column placeholder dat=. nrc$a: NB. Allocate to final size: faster than appends while. sro<0{nrc do. NB. Get groups of rows at a time. rpi=. rpi<.sro-~0{nrc NB. Don't go past end row rc=. (sro,0)+0{nsr NB. Row, col indexes for next chunk of data rc=. ":rc,rpi,(<1 1){nsr NB. Next batch of rows for all cols tmp=. xlreadr bw,rc NB. Read rows if. rpi~:#tmp do. NB. Don't know why this happens but 'corx rpi xv'=. fixSizeMismatch rpi;#tmp NB. it does. if. corx do. NB. Expand xv=. (sro$1),xv,1$~((#dat)++/0=xv)-sro+#xv dat=. xv #^:_1 dat else. NB. Compress xv=. (sro$1),xv,1$~(#dat)-sro+#xv dat=. xv#dat end. end. tmp=. ((#tmp),1{$dat){.tmp NB. Ensure widths match dat=. tmp (sro+i.rpi)}dat sro=. sro+rpi NB. Advance to next set of rows. end. dat=. dat#~+./\&.|.0 +./ . ~:|:>#&.>dat NB. Trim trailing empty rows. else. NB. Incomplete range means no data. dat=. '' end. wsname;range;<dat ) NB.* fixSizeMismatch: if 2 sizes differ, give expansion or compression vec NB. to fit 1st to 2nd. fixSizeMismatch=: 3 : 0 'rpi nrt'=. y NB. rows/iteration & number of rows in tmp corx=. xv=. 1 NB. Expansion vec if nothing to do. if. rpi~:nrt do. if. rpi<nrt do. NB. More rows of tmp than rpi... xv=. nrt{.rpi$1 else. NB. Fewer rows of tmp than rpi... xv=. rpi{.nrt$1 NB. Actually is compression vec, not expansion corx=. 0 NB. Signal compression, not expansion end. rpi=. nrt end. corx;rpi;xv NB.EG 'corx rpi xv'=. fixSizeMismatch rpi;#tmp ) NB.* guessDriveOfFile: guess drive on which named file resides; return NB. first valid match. guessDriveOfFile=: 3 : 0 'EC' guessDriveOfFile y NB. E: and C: are my 2 most likely drives. : fl=. y drives=. x ,&.>/ ':' if. ':' e. fl do. NB. If a drive was specified, look at wh=. >:fl i. ':' NB. it first but don't assume we'll drives=. ~.(<wh{.fl),drives NB. find file there. fl=. wh}.fl else. NB. If no drive specified, look drives=. ~.(<currDrive ''),drives NB. on current drive first. end. good=. '' NB. good=. ;fexist&.> drives ,&.>/ boxopen fl NB. good#drives NB. Above 2 lines find all occurrences but probably only want 1st - assuming NB. we search the drives in a rational order. for_i. i. #drives do. if. fexist (>i{drives),fl do. good=. >i{drives break. end. end. good ) NB.* translateXLRange: translate between Excel range notation and numeric NB. equivalent, e.g. '$A$1:$J$10' -> 1 1 10 10, or 27 2 43 26 -> translateXLRange=: 3 : 0 if. ' '={.0$y do. NB. Excel -> numeric >translateXLCellNotation&.><;._1 ':',y else. NB. numeric (row,col,row,col) -> Excel xlnot=. translateXLCellNotation&.>(<"1) 2 2$,y (>0{xlnot),':',>1{xlnot end. NB.EG (2 2$9 27 99 54) -: translateXLRange '$AA$9:$BB$99' NB.EG '$B$1:$Z$10' -: translateXLRange 1 2 10 26 ) NB.* translateXLCellNotation: translate between origin-1 row-col coordinates NB. and MS Excel spreadsheet cell designation, e.g. $A$1->1 1, or NB. $AA$3 -> 3 27; or 1 256 -> '$IV$1'. Does NOT account for Excel NB. limit of 256 columns and 65536 rows: will translate an unuseable NB. address like '$XL$99999' but fails after column 728 ('$ZZ') NB. because limited to 2 alpha digits. translateXLCellNotation=: 3 : 0 alph=. ' ABCDEFGHIJKLMNOPQRSTUVWXYZ' [ nums=. '0123456789' adrs=. y NB. NO validity checking on address. if. ' '={.0$adrs do. NB. Excel -> numeric adrs=. toupper adrs-.' ' NB. Only want ' ' if leading whlets=. adrs e. alph NB. intro'd by _2{. below. whnums=. adrs e. nums col=. ((<:#alph),1)+/ . * _2{.alph i. whlets#adrs (". whnums#adrs), col NB. Row, column order is more natural for us. else. NB. numeric (row, col) -> Excel col=. 0 1+(0,<:#alph)#:<:(1<.<:#,adrs){adrs col=. ' '-.~'$',col{alph NB. Excel stores in column-major if. 1~:#adrs do. col, '$',":0{adrs NB. order. else. col end. NB. Assume single # just column. end. NB.EG 1 27 -: translateXLCellNotation '$AA$1' NB.EG '$IV$1' -: translateXLCellNotation 1 256 NB. This function is own inverse, e.g. NB.EG cells=. '$A$1';'$Z$1';'$AA$2';'$AZ$2';<'$BA$3' NB.EG cells -: translateXLCellNotation&.>translateXLCellNotation&.>cells ) NB.* quoteIfSp: surround name with '"'s if embedded spaces (stupid MS!). quoteIfSp=: 3 : 0 ifsp=. '"'#~' ' e. y flnm=. ifsp,(y-.'"'),ifsp ) NB. This should be in xlutil.ijs. NB. xlusedrange 'book sheet' DM 2002 11 13 xlusedrange=: 3 : 0 3 xljmacro 'jusedrange ',y ) NB.* macroUsedRange: this is the text of a macro to be inserted into NB. jmacro.xls. macroUsedRange=: 0 : 0 ' jusedrange book sheet DM 2002 11 13 Function jusedrange(b, s) As String jusedrange = Workbooks(b).Worksheets(s).UsedRange.Address End Function ) coclass 'base'