JDB/Netflix

Here is further testing of User:Devon McCormick/JDBWithNetflixChallengeData.

Preliminary Tests

As shown below, we build a JDB database from scratch every time populate is run. Four methods are used:

init drops and recreates a database
step inserts 1,000,000 rows from makedata
reset closes and reopens database keeping data
close would close the database without dropping it

A strange things happens, after a certain number of iterations, the verb populate exits to immediate mode, without error (tested on 32-bit Windows XP). [{{#file: ""}} Download script: ]

require 'data/jdb'

DBNAME=: 'nfbase'

mcrd=: 0 : 0
  movid  int
  custno int
  rating byte
  rdate  int
)

makedata=: 3 : 0
  movid  =. ?y#1e6
  custno =. ?y#1e6
  rating =. '12345'{~?y#5  NB. Movies rated on scale of 1-5.
  rdate  =. ?y#1e6
  movid ; custno ; rating ; rdate
)

NB. =========================================================
init=: 3 : 0
  hf=: Open_jdb_ jpath'~temp'
  Drop__hf   DBNAME
  hd=: Create__hf DBNAME
  Create__hd 'mcrd';mcrd
)

step=: 3 : 0
  Insert__hd 'mcrd';<makedata 1000000
)

open=: 3 : 0
  hf=: Open_jdb_ jpath'~temp'
  hd=: Open__hf DBNAME
)

close=: 3 : 0
  Close_jdb_ hf
)

reset=: open@close

NB. =========================================================
populate=: 3 : 0
  total=. y
  num=. fail=. stuck=. 0
  last=. _1
  init ''
  while. num < total do.
NB.     try.
      step ''
      num=. num+1
NB.     catch.
NB.       if. last=num do. stuck=. 1 break. end.
NB.       fail=. fail+1
NB.       last=. num
NB.       reset ''
NB.     end.
    smoutput 'Current ',(":num),'  Failed  ',(":fail)
    wd 'msgs'
  end.
NB.   close ''
  stuck , num
)

NB. =========================================================
Note 'Test'
  populate 100
  «queries»
)

The result is

   load 'D:\Math\j602-user\netflix.ijs'

   populate 100
Current 1  Failed  0
Current 2  Failed  0
...
Current 40  Failed  0
Current 41  Failed  0

at which point it just exits to immediate mode without error.

The file sizes show how large the database is.

   load'dir'
   dir jpath'~temp/nfbase/mcrd'
c0   168173284 22-Oct-08 02:23:27
c1   168173284 22-Oct-08 02:23:27
c2   168173284 22-Oct-08 02:23:27
c3   168173284 22-Oct-08 02:23:27
c4    42044284 22-Oct-08 02:23:27
c5   168173284 22-Oct-08 02:23:27
d0   168173284 22-Oct-08 02:23:27
d1        5284 22-Oct-08 02:22:31
d2        1296 22-Oct-08 02:22:31

Note that after that abrupt silent exit, it is possible to manually continue to insert until error, then reset, then insert more.

   reset''

     Insert__hd 'mcrd';<makedata 1000000
... (about 10 times)
     Insert__hd 'mcrd';<makedata 1000000
|out of memory: insert1
|   Tinsert=:Tinsert    ,each y

   reset''

     Insert__hd 'mcrd';<makedata 1000000
... (about 10 times)
     Insert__hd 'mcrd';<makedata 1000000
|out of memory: insert1
|   Tinsert=:Tinsert    ,each y

As dir shows, the size of files grow.

   dir jpath'~temp/nfbase/mcrd'
c0   196201284 22-Oct-08 02:37:35
c1   196201284 22-Oct-08 02:37:35
c2   196201284 22-Oct-08 02:37:35
c3   196201284 22-Oct-08 02:37:35
c4    49051284 22-Oct-08 02:37:35
c5   196201284 22-Oct-08 02:37:35
d0   196201284 22-Oct-08 02:37:35
d1        5284 22-Oct-08 02:22:31
d2        1296 22-Oct-08 02:22:31

Modifying JDB Error Handling

With the help of J Forum it turned out the reason for immediate exit without error reporting is a possible use of "throw.".

As was discussed in the Verb abruptly ends without error thread, jdb.ijs was modified as follows:

1. defined at the top
rethrow=: 0 0 $ (13!:12@(''"_) , ]) (13!:8) 12"_ 2. replaced definition
throw=: 0 0"_ $ ] (13!:8) 12"_ 3. in commit removed the catcht. block and replaced remaining throw. with rethrow'commit failed'

1. because removal of throw. any catcht. block can be removed, such as in jdbserver.ijs etc.

2. what happens to normal diagnostic messaging is that it becomes simply errors instead of smoutput. It actually makes it better for programmatic error handling.

   Read__hd'qq from zz'
|101 Table not found: zz: throw
|       throw'101 Table not found: ',>y

Analyzing Revealed Errors

With these changes, we get immediate error feedback:

     populate 100
Current 1  Failed  0
...
Current 39  Failed  0
Current 40  Failed  0
||out of memory: commit
|   (id)=:id~    ,i pick Tinsert
commit failed: rethrow
|       rethrow'commit failed'

Look how nested re-thrown exceptions are represented.

Now we can return to the original test, removing all the comments that masked try/catch logic:

     populate 100
Current 1  Failed  0
Current 2  Failed  0
...
Current 40  Failed  0
Current 40  Failed  1
Current 41  Failed  1
Current 41  Failed  2
Current 42  Failed  2
...
Current 50  Failed  10
Current 50  Failed  11
Current 51  Failed  11
Current 51  Failed  12
Current 52  Failed  12
Current 52  Failed  13
1 52

so we can move a few steps further, until we are stuck.

We also see the resulting grown file sizes

   dir jpath'~temp/nfbase/mcrd'
c0   212217284 22-Oct-08 21:28:50
c1   212217284 22-Oct-08 21:28:50
c2   212217284 22-Oct-08 21:28:50
c3   212217284 22-Oct-08 21:28:50
c4    53055284 22-Oct-08 21:28:50
c5   212217284 22-Oct-08 21:28:50
d0   212217284 22-Oct-08 21:28:50
d1        5284 22-Oct-08 21:26:28
d2        1296 22-Oct-08 21:26:28

Further we can extend populate to accumulate error messages.

  err=: ''
...
  catch.
    err=: err,<13!:12''

As a result we can see two types of errors:

   ;,&LF each~.err
||out of memory: commit
|   (id)=:id~    ,i pick Tinsert
commit failed: rethrow
|       rethrow'commit failed'

|bad view: assert
|   'bad view'    assert 0[free fh,mh,0

In order to capture the error in debugger, catch. is simply replaced with catchd.

In subsequent debugger stop we can observe the stack.

Testing Queries

Depending on physical and virtual memory (2-4Gb of RAM), the limit of populating the data (30-50m records) will also correspond to the limits of different queries. From the queries below some will cause memory errors and some will run, depending on how specific they are. [{{#file: ""}} Download script: ]

open ''
Reads__hd 'count autoid from mcrd'
Reads__hd 'count autoid from mcrd where rating in 1'
Reads__hd 'count autoid from mcrd where rating in 123'
Reads__hd 'count autoid by rating from mcrd where rating in 123'
Reads__hd '* from mcrd where movid<500 and custno<500 order by movid,custno'
close ''

To run all queries, the populate parameter needs to be reduced by about 20-30% from where it reached its limit. E.g. with 3Gb of RAM, populate limit is 40m and all queries run at 30m.

Conclusions

The total size of the table files

   +/;2 {"1]1!:0 jpath'~temp/nfbase/mcrd/*'
1 326 365 568

shows that it is still far from the theoretical space about ~3Gb.

Possibly this is due to address space fragmentation (the system has 5Gb virtual memory).

Although these results may indicate that some optimization is possible, still at this early stage of JDB, it could not be expected to bring drastic results beyond 1.5-2x to warrant the effort, especially regarding the fact that this size would reach the limits of 32-bit address space and at the same time its usefulness.

So with databases exceeding 1Gb of data it is recommended to use 64-bit operating system.

On the part of error handling, these examples show that using assert/signal to raise an exception works better that using throw./catch. in the respects outlined in the forum message:

you can pass the information (which is extracted in the 13!:11/12)
such exceptions are first class same as other system errors
they can be caught with the same catch
they are caught with debugger as errors; does not require a separate catcht branch
assert is tacit thus showing error exactly in place where it is raised, not in nested or outer call

So these examples can help evaluate this kind of error handling for possible adoption.

Addendum

Execution time of Netflix test runs, contributed by David Mitchell. Updated with results from 8 Gigabyte system.