Monday, October 29, 2012

Migrating from OracleXE - choosing a new database (SciDB vs. MySQL)

I hit the 11 GB limit on OracleXE for my home project, so I needed to find a new database.  I quickly settled on either using MySQL or SciDB.

MySQL is another relational database, but opensource / there is a community edition available for free without a size limit*

SciDB is a new type of database designed for "big data" and doing advanced mathematics on that data.

Here are the basic pros and cons I came up with for each based on my situation

MySQL:

  • pros:
    • well established
    • easy to install
    • easy (probably) to switch my hibernate code to use instead of XE
    • would learn about the a major Oracle alternative (haven't used it in ~5 years)
  • cons:
    • probably at best as fast as OracleXE for linear algebra operations, possibly slower
    • not sure how / if possible to run on multiple processors
SciDB
  • pros:
    • designed / optimized for linear algebra
    • learn something completely new
    • designed to be scalable and run calculations in parallel
  • cons:
    • experimental / not well established
    • not sure if it works with hibernate

Next step:  timebox SciDB investigation

I decided to set aside 4 hours of work to see if I could get SciDB up and running.  If it took longer than that, then I made the guess that I would continue to have chronic problems using it.  In the meantime, as a backup I installed MySQL on my system.

System:  Dell laptop running Win7.  8 GB RAM.  Pentium with 8 "threads" (~= cores).  I have VMWare player with an installation of Ubuntu 11.10.
SciDB runs on linux so all my work with SciDB was done on the Ubuntu running on the VMWare player.

I attempted to install SciDB v12.3 from binaries and could not resolve all the depedencies.  They are specified for 11.04, so that may have been the issue.

Install from source code worked fine - it was relatively easy and fast (beat the timebox by a lot).  I followed the instructions in the manual.  There was a minor problem with the config file in the documentation, as was explained in this forum post:
http://www.scidb.org/forum/viewtopic.php?f=11&t=506&p=828&hilit=803#p828

You need to create a user called "scidb" with sudo privileges, and then when you execute commands log in as the scidb user.  

I also wrote it up on the scidb forum:

Decision:  SciDB

At this point I decided to go with SciDB.  It passed the install timebox test very well and I decided I would rather learn about the newer SciDB than MySQL.

Edit: Probably does not work with hibernate

It appears that SciDB only currently provides connectors for Python and C/C++.  My plan therefore is to do a partial migration - the large, matrix-based data will be migrated to SciDB, while the relational information will stay in OracleXE.  It probably would make sense to migrate the relational information to the Postgres instance that is associated with the SciDB system, but I'm going to just focus on the first part. for now.

No comments:

Post a Comment