## Thursday, November 1, 2012

### Matrix Multiplication in SciDB - Chunk size requirement and approximate speed comparison to OracleXE

Having loaded some data into SciDB, I now get to the heart of the matter:  matrix multiplication.  Here is what I have for matrix dimensions:

Matrix A:  870,000 x 42000
Matrix B:  42000 x 100

I want to do
C = A x B

C will be
870,000 x 100

SciDB makes this easy:
AFL% multiply(A, B)

will produce the result, but we really want to store it, and not display it, so rather than run from within iquery, issue a command using iquery that suppresses output:

scidb@ubuntu:~$iquery -naq "store ( multiply (A, B), C)" (much easier than SQL equivalent involving insert ... select with joins and aggregations) ***BUT THIS DIDN'T WORK*** For matrix multiplication, the inner dimensions of the multiplied matrices have to match ( in the above example, the size of the second dimension of Matrix A has to match the size of the first dimension of Matrix B). However, in SciDB in addition to dimension size matching, the chunk sizes of these dimensions must match. When I loaded the data I did not know this, so I went with the convention that I would make the chunk size based on columns within my matrices: the chunks in A were 870,000 x 1. In order to get the matrix multiplication to work I needed to make the chunks in A 20 x 42000: scidb@ubuntu:~$  iquery -naq "create array A_new <value:double> [row=0:869000,20,0, col=0:41999,42000,0]"
scidb@ubuntu:~$iquery -naq "redimension_store ( A, A_new)"scidb@ubuntu:~$  iquery -naq "store ( multiply (A_new, B), C)"

Notes:
• I chose 20 for the chunk size of dimension 1 in A_new instead of 1 in order to bring the size of the chunk closer to 1,000,000 as recommended
• A and A_new have the same name for the attributes and dimensions
• In this case - see previous posts on using redimension_store to map attributes to dimensions
• If the dimensions are the same but the attributes are different, use nested attribute_rename:
• redimension_store ( attribute_rename(A, val, value), A_new)

#### Speed comparison

This ran in ~300 minutes on my system.  I estimated it would run in 150 minutes in OracleXE.  My thoughts on this:

• I couldn't run this under OracleXE b/c it caused me to cross the 11 GB limit
• OracleXE estimate is based on running the calculation Ci = A x Bi  where i indicates which column of B or C.  Ran this for ~3, took about the same amount of time for each, extrapolated
• Not sure if this estimate would hold up as the calculation progressed
• Assuming it did, is this evidence
• of how highly optimized OracleXE is?
• that I've setup SciDB incorrectly?
• that I'm observing effects of running in VMWare instead of natively?
• OracleXE was only using one core.  SciDB was also using only one core.  But it should be (almost) trivial to setup SciDB to use multiple cores, I believe it is well near impossible (or illegal) to do so for OracleXE
• with my VMWare limitation of 4 cores, I would still be able to run in ~75 minutes, which easily beats the OracleXE time
• would a commercial Oracle install on multiple cores scale?
• How do the Oracle and SciDB scale at the much larger range of # of cores - e.g. 100 cores?  (it may not be meaningful to be comparing at the low end of the scale like I am doing here)