Wednesday, October 31, 2012

Migrating from OracleXE to SciDB - the recommended way

This post follows from a previous post describing my initial data migraiton efforts:
http://dllahr.blogspot.com/2012/10/migrating-data-from-oraclexe-to-scidb.html

Thanks to Paul on SciDB forum for showing me another way to load the data without having to write my own script to generate the SciDB formatted data files.  This method is probably much safer in that if the SciDB format changes I don't need to worry about updating my script.
http://www.scidb.org/forum/viewtopic.php?f=11&t=598

The recommended way

  1. export the data such that the each row in the output file contains
    • row, column, value
  2. Import this into SciDB as an array with 1 dimension and 3 attributes.
    • create array raw <row:int64, col:int64, value:double> [fileRow=0:*,1000000,0]
      • uint32 is a data type of unsigned int, 32 bits
      • fileRow=0:*,1000000,0    indicates that the array has unspecified length with chunk size of 1 million
  3. Create array with correct structure to store the data
    • create array myData <value:double> [row=0:99999,100000,0, col=0:999,100,0]
  4. Convert this into a 2-D matrix in SciDB using the command:
    1. redimension_store(raw, myData)
    2. the attributes in raw are matched to dimensions in myData
Notes:
  • Did not require the raw data to be in any kind of sorted order
  • works for sparse data
  • load  and convert operations took on the order of minutes for my data which was 15 million rows

Edit:  Critical:  I had incorrectly specified the value types in the raw array for indexes as uint32. They should be int64.  For some reason the redimension_store process gets corrupted otherwise.

No comments:

Post a Comment