JDB/Netflix
Here is further testing of User:Devon McCormick/JDBWithNetflixChallengeData.
Preliminary Tests
As shown below, we build a JDB database from scratch every time populate is run. Four methods are used:
- init drops and recreates a database
- step inserts 1,000,000 rows from makedata
- reset closes and reopens database keeping data
- close would close the database without dropping it
A strange things happens, after a certain number of iterations, the verb populate exits to immediate mode, without error (tested on 32-bit Windows XP). [{{#file: ""}} Download script: ]
require 'data/jdb' DBNAME=: 'nfbase' mcrd=: 0 : 0 movid int custno int rating byte rdate int ) makedata=: 3 : 0 movid =. ?y#1e6 custno =. ?y#1e6 rating =. '12345'{~?y#5 NB. Movies rated on scale of 1-5. rdate =. ?y#1e6 movid ; custno ; rating ; rdate ) NB. ========================================================= init=: 3 : 0 hf=: Open_jdb_ jpath'~temp' Drop__hf DBNAME hd=: Create__hf DBNAME Create__hd 'mcrd';mcrd ) step=: 3 : 0 Insert__hd 'mcrd';<makedata 1000000 ) open=: 3 : 0 hf=: Open_jdb_ jpath'~temp' hd=: Open__hf DBNAME ) close=: 3 : 0 Close_jdb_ hf ) reset=: open@close NB. ========================================================= populate=: 3 : 0 total=. y num=. fail=. stuck=. 0 last=. _1 init '' while. num < total do. NB. try. step '' num=. num+1 NB. catch. NB. if. last=num do. stuck=. 1 break. end. NB. fail=. fail+1 NB. last=. num NB. reset '' NB. end. smoutput 'Current ',(":num),' Failed ',(":fail) wd 'msgs' end. NB. close '' stuck , num ) NB. ========================================================= Note 'Test' populate 100 «queries» )
The result is
load 'D:\Math\j602-user\netflix.ijs' populate 100 Current 1 Failed 0 Current 2 Failed 0 ... Current 40 Failed 0 Current 41 Failed 0
at which point it just exits to immediate mode without error.
The file sizes show how large the database is.
load'dir' dir jpath'~temp/nfbase/mcrd' c0 168173284 22-Oct-08 02:23:27 c1 168173284 22-Oct-08 02:23:27 c2 168173284 22-Oct-08 02:23:27 c3 168173284 22-Oct-08 02:23:27 c4 42044284 22-Oct-08 02:23:27 c5 168173284 22-Oct-08 02:23:27 d0 168173284 22-Oct-08 02:23:27 d1 5284 22-Oct-08 02:22:31 d2 1296 22-Oct-08 02:22:31
Note that after that abrupt silent exit, it is possible to manually continue to insert until error, then reset, then insert more.
reset'' Insert__hd 'mcrd';<makedata 1000000 ... (about 10 times) Insert__hd 'mcrd';<makedata 1000000 |out of memory: insert1 | Tinsert=:Tinsert ,each y reset'' Insert__hd 'mcrd';<makedata 1000000 ... (about 10 times) Insert__hd 'mcrd';<makedata 1000000 |out of memory: insert1 | Tinsert=:Tinsert ,each y
As dir shows, the size of files grow.
dir jpath'~temp/nfbase/mcrd' c0 196201284 22-Oct-08 02:37:35 c1 196201284 22-Oct-08 02:37:35 c2 196201284 22-Oct-08 02:37:35 c3 196201284 22-Oct-08 02:37:35 c4 49051284 22-Oct-08 02:37:35 c5 196201284 22-Oct-08 02:37:35 d0 196201284 22-Oct-08 02:37:35 d1 5284 22-Oct-08 02:22:31 d2 1296 22-Oct-08 02:22:31
Modifying JDB Error Handling
With the help of J Forum it turned out the reason for immediate exit without error reporting is a possible use of "throw.".
As was discussed in the Verb abruptly ends without error thread, jdb.ijs was modified as follows:
1. defined at the top
rethrow=: 0 0 $ (13!:12@(''"_) , ]) (13!:8) 12"_
2. replaced definition
throw=: 0 0"_ $ ] (13!:8) 12"_
3. in commit removed the catcht. block and replaced remaining throw. with rethrow'commit failed'
1. because removal of throw. any catcht. block can be removed, such as in jdbserver.ijs etc.
2. what happens to normal diagnostic messaging is that it becomes simply errors instead of smoutput. It actually makes it better for programmatic error handling.
Read__hd'qq from zz' |101 Table not found: zz: throw | throw'101 Table not found: ',>y
Analyzing Revealed Errors
With these changes, we get immediate error feedback:
populate 100 Current 1 Failed 0 ... Current 39 Failed 0 Current 40 Failed 0 ||out of memory: commit | (id)=:id~ ,i pick Tinsert commit failed: rethrow | rethrow'commit failed'
Look how nested re-thrown exceptions are represented.
Now we can return to the original test, removing all the comments that masked try/catch logic:
populate 100 Current 1 Failed 0 Current 2 Failed 0 ... Current 40 Failed 0 Current 40 Failed 1 Current 41 Failed 1 Current 41 Failed 2 Current 42 Failed 2 ... Current 50 Failed 10 Current 50 Failed 11 Current 51 Failed 11 Current 51 Failed 12 Current 52 Failed 12 Current 52 Failed 13 1 52
so we can move a few steps further, until we are stuck.
We also see the resulting grown file sizes
dir jpath'~temp/nfbase/mcrd' c0 212217284 22-Oct-08 21:28:50 c1 212217284 22-Oct-08 21:28:50 c2 212217284 22-Oct-08 21:28:50 c3 212217284 22-Oct-08 21:28:50 c4 53055284 22-Oct-08 21:28:50 c5 212217284 22-Oct-08 21:28:50 d0 212217284 22-Oct-08 21:28:50 d1 5284 22-Oct-08 21:26:28 d2 1296 22-Oct-08 21:26:28
Further we can extend populate to accumulate error messages.
err=: '' ... catch. err=: err,<13!:12''
As a result we can see two types of errors:
;,&LF each~.err ||out of memory: commit | (id)=:id~ ,i pick Tinsert commit failed: rethrow | rethrow'commit failed' |bad view: assert | 'bad view' assert 0[free fh,mh,0
In order to capture the error in debugger, catch. is simply replaced with catchd.
In subsequent debugger stop we can observe the stack.
Testing Queries
Depending on physical and virtual memory (2-4Gb of RAM), the limit of populating the data (30-50m records) will also correspond to the limits of different queries. From the queries below some will cause memory errors and some will run, depending on how specific they are. [{{#file: ""}} Download script: ]
open '' Reads__hd 'count autoid from mcrd' Reads__hd 'count autoid from mcrd where rating in 1' Reads__hd 'count autoid from mcrd where rating in 123' Reads__hd 'count autoid by rating from mcrd where rating in 123' Reads__hd '* from mcrd where movid<500 and custno<500 order by movid,custno' close ''
To run all queries, the populate parameter needs to be reduced by about 20-30% from where it reached its limit. E.g. with 3Gb of RAM, populate limit is 40m and all queries run at 30m.
Conclusions
The total size of the table files
+/;2 {"1]1!:0 jpath'~temp/nfbase/mcrd/*' 1 326 365 568
shows that it is still far from the theoretical space about ~3Gb.
Possibly this is due to address space fragmentation (the system has 5Gb virtual memory).
Although these results may indicate that some optimization is possible, still at this early stage of JDB, it could not be expected to bring drastic results beyond 1.5-2x to warrant the effort, especially regarding the fact that this size would reach the limits of 32-bit address space and at the same time its usefulness.
So with databases exceeding 1Gb of data it is recommended to use 64-bit operating system.
On the part of error handling, these examples show that using assert/signal to raise an exception works better that using throw./catch. in the respects outlined in the forum message:
- you can pass the information (which is extracted in the 13!:11/12)
- such exceptions are first class same as other system errors
- they can be caught with the same catch
- they are caught with debugger as errors; does not require a separate catcht branch
- assert is tacit thus showing error exactly in place where it is raised, not in nested or outer call
So these examples can help evaluate this kind of error handling for possible adoption.
Addendum
Execution time of Netflix test runs, contributed by David Mitchell. Updated with results from 8 Gigabyte system.