Distractions: Matrix Multiplication in SciDB - Chunk size requirement and approximate speed comparison to OracleXE

Having loaded some data into SciDB, I now get to the heart of the matter: matrix multiplication. Here is what I have for matrix dimensions:

Matrix A: 870,000 x 42000
Matrix B: 42000 x 100

I want to do
C = A x B

C will be
870,000 x 100

SciDB makes this easy:
AFL% multiply(A, B)

will produce the result, but we really want to store it, and not display it, so rather than run from within iquery, issue a command using iquery that suppresses output:

scidb@ubuntu:~$ iquery -naq "store ( multiply (A, B), C)"

(much easier than SQL equivalent involving insert ... select with joins and aggregations)

***BUT THIS DIDN'T WORK***

For matrix multiplication, the inner dimensions of the multiplied matrices have to match ( in the above example, the size of the second dimension of Matrix A has to match the size of the first dimension of Matrix B). However, in SciDB in addition to dimension size matching, the chunk sizes of these dimensions must match. When I loaded the data I did not know this, so I went with the convention that I would make the chunk size based on columns within my matrices: the chunks in A were 870,000 x 1. In order to get the matrix multiplication to work I needed to make the chunks in A 20 x 42000:
scidb@ubuntu:~$ iquery -naq "create array A_new <value:double> [row=0:869000,20,0, col=0:41999,42000,0]"

scidb@ubuntu:~$  iquery -naq "redimension_store ( A, A_new)"
scidb@ubuntu:~$  iquery -naq "store ( multiply (A_new, B), C)"

Notes:

I chose 20 for the chunk size of dimension 1 in A_new instead of 1 in order to bring the size of the chunk closer to 1,000,000 as recommended
A and A_new have the same name for the attributes and dimensions

In this case - see previous posts on using redimension_store to map attributes to dimensions
If the dimensions are the same but the attributes are different, use nested attribute_rename:

redimension_store ( attribute_rename(A, val, value), A_new)

Speed comparison

This ran in ~300 minutes on my system. I estimated it would run in 150 minutes in OracleXE. My thoughts on this:

I couldn't run this under OracleXE b/c it caused me to cross the 11 GB limit
OracleXE estimate is based on running the calculation Ci = A x Bi where i indicates which column of B or C. Ran this for ~3, took about the same amount of time for each, extrapolated

Not sure if this estimate would hold up as the calculation progressed
Assuming it did, is this evidence

of how highly optimized OracleXE is?
that I've setup SciDB incorrectly?
that I'm observing effects of running in VMWare instead of natively?

OracleXE was only using one core. SciDB was also using only one core. But it should be (almost) trivial to setup SciDB to use multiple cores, I believe it is well near impossible (or illegal) to do so for OracleXE

with my VMWare limitation of 4 cores, I would still be able to run in ~75 minutes, which easily beats the OracleXE time
would a commercial Oracle install on multiple cores scale?
How do the Oracle and SciDB scale at the much larger range of # of cores - e.g. 100 cores? (it may not be meaningful to be comparing at the low end of the scale like I am doing here)

Distractions

Thursday, November 1, 2012

Matrix Multiplication in SciDB - Chunk size requirement and approximate speed comparison to OracleXE

Speed comparison

No comments:

Post a Comment

Contributors

Blog Archive

Subscribe To Distractions

Thursday, November 1, 2012

Matrix Multiplication in SciDB - Chunk size requirement and approximate speed comparison to OracleXE

Speed comparison

No comments:

Post a Comment

Contributors

Blog Archive