http://dllahr.blogspot.com/2012/10/migrating-data-from-oraclexe-to-scidb.html
Thanks to Paul on SciDB forum for showing me another way to load the data without having to write my own script to generate the SciDB formatted data files. This method is probably much safer in that if the SciDB format changes I don't need to worry about updating my script.
http://www.scidb.org/forum/viewtopic.php?f=11&t=598
The recommended way
- export the data such that the each row in the output file contains
- row, column, value
- Import this into SciDB as an array with 1 dimension and 3 attributes.
- create array raw <row:int64, col:int64, value:double> [fileRow=0:*,1000000,0]
- uint32 is a data type of unsigned int, 32 bits
- fileRow=0:*,1000000,0 indicates that the array has unspecified length with chunk size of 1 million
- Create array with correct structure to store the data
- create array myData <value:double> [row=0:99999,100000,0, col=0:999,100,0]
- Convert this into a 2-D matrix in SciDB using the command:
- redimension_store(raw, myData)
- the attributes in raw are matched to dimensions in myData
Notes:
- Did not require the raw data to be in any kind of sorted order
- works for sparse data
- load and convert operations took on the order of minutes for my data which was 15 million rows
No comments:
Post a Comment