Matrix A: 870,000 x 42000
Matrix B: 42000 x 100
I want to do
C = A x B
C will be
870,000 x 100
SciDB makes this easy:
AFL% multiply(A, B)
will produce the result, but we really want to store it, and not display it, so rather than run from within iquery, issue a command using iquery that suppresses output:scidb@ubuntu:~$ iquery -naq "store ( multiply (A, B), C)"
(much easier than SQL equivalent involving insert ... select with joins and aggregations)
***BUT THIS DIDN'T WORK***
For matrix multiplication, the inner dimensions of the multiplied matrices have to match ( in the above example, the size of the second dimension of Matrix A has to match the size of the first dimension of Matrix B). However, in SciDB in addition to dimension size matching, the chunk sizes of these dimensions must match. When I loaded the data I did not know this, so I went with the convention that I would make the chunk size based on columns within my matrices: the chunks in A were 870,000 x 1. In order to get the matrix multiplication to work I needed to make the chunks in A 20 x 42000:
scidb@ubuntu:~$ iquery -naq "create array A_new <value:double> [row=0:869000,20,0, col=0:41999,42000,0]"
scidb@ubuntu:~$ iquery -naq "redimension_store ( A, A_new)"
scidb@ubuntu:~$ iquery -naq "store ( multiply (A_new, B), C)"
Notes:
- I chose 20 for the chunk size of dimension 1 in A_new instead of 1 in order to bring the size of the chunk closer to 1,000,000 as recommended
- A and A_new have the same name for the attributes and dimensions
- In this case - see previous posts on using redimension_store to map attributes to dimensions
- If the dimensions are the same but the attributes are different, use nested attribute_rename:
redimension_store ( attribute_rename(A, val, value), A_new)
Speed comparison
This ran in ~300 minutes on my system. I estimated it would run in 150 minutes in OracleXE. My thoughts on this:- I couldn't run this under OracleXE b/c it caused me to cross the 11 GB limit
- OracleXE estimate is based on running the calculation Ci = A x Bi where i indicates which column of B or C. Ran this for ~3, took about the same amount of time for each, extrapolated
- Not sure if this estimate would hold up as the calculation progressed
- Assuming it did, is this evidence
- of how highly optimized OracleXE is?
- that I've setup SciDB incorrectly?
- that I'm observing effects of running in VMWare instead of natively?
- OracleXE was only using one core. SciDB was also using only one core. But it should be (almost) trivial to setup SciDB to use multiple cores, I believe it is well near impossible (or illegal) to do so for OracleXE
- with my VMWare limitation of 4 cores, I would still be able to run in ~75 minutes, which easily beats the OracleXE time
- would a commercial Oracle install on multiple cores scale?
- How do the Oracle and SciDB scale at the much larger range of # of cores - e.g. 100 cores? (it may not be meaningful to be comparing at the low end of the scale like I am doing here)
No comments:
Post a Comment