Thursday, November 1, 2012

Matrix Multiplication in SciDB - Chunk size requirement and approximate speed comparison to OracleXE

Having loaded some data into SciDB, I now get to the heart of the matter:  matrix multiplication.  Here is what I have for matrix dimensions:

Matrix A:  870,000 x 42000
Matrix B:  42000 x 100

I want to do
C = A x B

C will be
870,000 x 100

SciDB makes this easy:
AFL% multiply(A, B)

will produce the result, but we really want to store it, and not display it, so rather than run from within iquery, issue a command using iquery that suppresses output:

scidb@ubuntu:~$  iquery -naq "store ( multiply (A, B), C)"

(much easier than SQL equivalent involving insert ... select with joins and aggregations)

***BUT THIS DIDN'T WORK***

Wednesday, October 31, 2012

Migrating from OracleXE to SciDB - the recommended way

This post follows from a previous post describing my initial data migraiton efforts:
http://dllahr.blogspot.com/2012/10/migrating-data-from-oraclexe-to-scidb.html

Thanks to Paul on SciDB forum for showing me another way to load the data without having to write my own script to generate the SciDB formatted data files.  This method is probably much safer in that if the SciDB format changes I don't need to worry about updating my script.
http://www.scidb.org/forum/viewtopic.php?f=11&t=598

The recommended way

Monday, October 29, 2012

Migrating data from OracleXE to SciDB (outdated)

Edit:  Read about the easier, recommended method here: http://dllahr.blogspot.com/2012/10/migrating-from-oraclexe-to-scidb.html 


Migrating from OracleXE - choosing a new database (SciDB vs. MySQL)

I hit the 11 GB limit on OracleXE for my home project, so I needed to find a new database.  I quickly settled on either using MySQL or SciDB.

MySQL is another relational database, but opensource / there is a community edition available for free without a size limit*

SciDB is a new type of database designed for "big data" and doing advanced mathematics on that data.

Here are the basic pros and cons I came up with for each based on my situation

MySQL:

  • pros:
    • well established
    • easy to install
    • easy (probably) to switch my hibernate code to use instead of XE
    • would learn about the a major Oracle alternative (haven't used it in ~5 years)
  • cons:
    • probably at best as fast as OracleXE for linear algebra operations, possibly slower
    • not sure how / if possible to run on multiple processors
SciDB
  • pros:
    • designed / optimized for linear algebra
    • learn something completely new
    • designed to be scalable and run calculations in parallel
  • cons:
    • experimental / not well established
    • not sure if it works with hibernate

Next step:  timebox SciDB investigation

I decided to set aside 4 hours of work to see if I could get SciDB up and running.  If it took longer than that, then I made the guess that I would continue to have chronic problems using it.  In the meantime, as a backup I installed MySQL on my system.

System:  Dell laptop running Win7.  8 GB RAM.  Pentium with 8 "threads" (~= cores).  I have VMWare player with an installation of Ubuntu 11.10.
SciDB runs on linux so all my work with SciDB was done on the Ubuntu running on the VMWare player.

I attempted to install SciDB v12.3 from binaries and could not resolve all the depedencies.  They are specified for 11.04, so that may have been the issue.

Install from source code worked fine - it was relatively easy and fast (beat the timebox by a lot).  I followed the instructions in the manual.  There was a minor problem with the config file in the documentation, as was explained in this forum post:
http://www.scidb.org/forum/viewtopic.php?f=11&t=506&p=828&hilit=803#p828

You need to create a user called "scidb" with sudo privileges, and then when you execute commands log in as the scidb user.  

I also wrote it up on the scidb forum:

Decision:  SciDB

At this point I decided to go with SciDB.  It passed the install timebox test very well and I decided I would rather learn about the newer SciDB than MySQL.

Edit: Probably does not work with hibernate

It appears that SciDB only currently provides connectors for Python and C/C++.  My plan therefore is to do a partial migration - the large, matrix-based data will be migrated to SciDB, while the relational information will stay in OracleXE.  It probably would make sense to migrate the relational information to the Postgres instance that is associated with the SciDB system, but I'm going to just focus on the first part. for now.

Saturday, July 7, 2012

Messing with SWF (Flash files)

I wanted to extract the music from a flash file I was viewing, so I looked at the source code for the web page and found the link to the SWF file.  I downloaded this directly using curl
curl -O http://address.goes.here

I tried to play it in mplayer, no dice, similar with ffmpeg.  Some googling later revealed that I could either attempt to convert the full thing to video, or I could just extract what I needed.  I went with the later, using SwfTools

(thanks to Doesn't Not Compute for the pointers)

Google found this older FAQ for SwfTools which I used - I followed the instructions under (4) and installed freetype and jpeglib first (in case I want to do more advanced stuff with SwfTools later).  After I configured and compiled each of these, I configured and compiled SwfTools.  I was then ready for fun.

I followed (14) from the FAQ to extract

  1. I listed everything in my SWF:
    • swfextract downloaded.swf
  2. From the list I saw the last entry was "[-m] 1 MP3 soundtrack".  I was able to extract it with:
    • swfextract -m downloaded.swf
    • (result sent to output.mp3)
  3. Success!

Wednesday, April 25, 2012

Speed of Light in Vacuum and in Solids; Cherenkov radiation

I'll start near the end (speed of light), and then work my way backwards, then jump to the finish (Cherenkov radiation).  The speed of light in a vacuum is defined based on Maxwell's equations, which can be re-arranged to give a wave equation, yielding a velocity of the wave that is:
c = 1 / sqrt(εo * μo)

εo  is the permittivity of free space.  This constant is used to calculate the electric field at some distance (r) from a charge.  It basically says if you have this much charge (Q), you get this much electric field (E).

μo is permeability of free space.  Similar to εo, this describes how if you have this much current (I) you get this much magnetic field (B)

The speed of light in a linear dielectric material is determined by the permittivity of the material (ε) , and the permeability of the material (μ):
v = 1 / sqrt(ε * μ)

These have the same meanings as above, but apply within the material.  For example, ε tells you if you have this much charge (Q) within your material, you can find this much electric field (E).

But the speed of light is not about charges or currents.  It is about electric and magnetic fields oscillating, and that oscillation propagating far away (and long after) the original charges' and currents' motions stopped.  So how do these constants come to define the speed of light?  The short answer is Maxwell's equations.  Basically, the infinitesimal story can be written as:

  1. electric charge undergoes a small acceleration
  2. perpendicular to the direction of motion, an electric field increases in magnitude
  3. Maxwell's third equation states that the change of the magnetic field in time is the negative curl of the electric field.  Leaving the mathematical details aside, the end result is that the increasing electric field from (2) causes an increasing magnetic field
  4. Maxwell's fourth equation states an analogous relationship to (3):  the change of the electric field in time is the curl of the magnetic field.  Again, the increasing magnetic field causes an increasing electric field.
  5. (3) and (4) then set up the propagation through empty space / vacuum.
In Maxwell's equations, the constant of proportionality that determines how much change in electric field (dE / dt) in time you get for some curl in the magnetic field (div B) is εo * μo.  So the story is that the time change / response of one type of field (electric or magnetic) to the other type determines the speed of propagation.  Now, this applies the same in materials - except that instead of having εo * μo describe that time response, we have ε * μ.

What about these material constants?  Well, the short answer is that the microscopic charges / structures of the material determines how much electric field (E) you get for a given charge (Q).  Generally these numbers (ε, μ) are greater than their vacuum counterparts.  A way to think about this is to imagine a "test" charge within a material.  This test charge will cause the microscopic charges within the material to be attracted / repelled.  This rearrangement of charge mimics and amplifies the presence of the test charge, causing it to appear like the test charge is larger than it is, causing the electric field (E) to be larger.

We can apply the same story to understand the slower speed of light within the material.  The adjusting electric field (of the wave) in the material now has to push on the microscopic charges and they have to re-arrange before the field can affect the magnetic field, and vice-versa, thus slowing down the propagation.

Cherenkov radiation

Cherenkov radiation occurs when a particle traveling near the speed of light (in a vacuum) enters a material.  Radiation is emitted as the particle slows down to a speed less than the speed of light in that material.  Note that a particle traveling with a constant velocity does not normally emit radiation; it is only in the case where the speed of the particle exceeds the speed of light in that material.

Thinking about the above, and Cherenkov radiation leads to this story:  The Cherenkov particle is traveling near the speed of light in vacuum (c) and approaches a material where the speed of light is slower (v).  Zooming way in to look at the particle, so that it is well separated in our field of view from the microscopic charges of the material, we see electric and magnetic fields behaving as they do in a vacuum in the immediate vicinity.  However, further out, we see these electric and magnetic fields interacting with the microscopic charges of the material.  The electric and magnetic fields do not (approximately) propagate beyond the nearby microscopic charges until the microscopic charges have time to re-arrange / respond (this is the from the discussion above of the difference between ε and εo).  In fact, the time to rearrange is so slow that once it is within the material the Cherenkov particle is going to catch up to the electric / magnetic fields within the material.  The Cherenkov particle is now "driving" in a concerted manner the electric / magnetic fields - leading the attack from the position of the van / wedge!  

Consider in contrast a regular, non-Cherenkov particle, traveling at less than the speed of light in the material (v).  The electric / magnetic fields in the material from this particle's motion propagate faster than the particle is traveling, so they outpace the particle.  The microscopic charges have time to "equilibrate" / rearrange around the particle's motion.  The difference here is that as some microscopic charges get pushed one way, others fill in the opposite way.  There is no uniform motion of the microscopic charges, and hence no net emission of radiation.

Application to faster than light motion in a vacuum

Thanks to Joel for pointing me towards this via discussion about the incident where, at the OPERA experiment, they thought they had observed neutrinos traveling faster than the speed of light.  A key paper (provided by Joel) was a discussion about how Cherenkov radiation should cause any neutrinos traveling faster than the speed of light to emit radiation (and/or particles, electron-positron pairs) and thus lose energy rapidly.  But that mechanism then allows for supra-luminal velocities - and leads me to imagine stories like the above applied to vacuum conditions.  A particle traveling faster than c has local, microscopic fields that propagate faster than c, and when they spread further out from the particle interact with the vacuum fields to cause Cherenkov radiation?  What would the scale of these microscopic fields be? Planck?  Some other characteristic wavelength of the vacuum radiation?  Interesting to think about.

Sunday, April 8, 2012

Photons and their relation to the waves in the electric / magnetic fields

When I was in college and I first heard about the concept of photons in physics, I initially guessed that a photon would correspond to the field (electric / magnetic) between 2 nodes of the wave:
I was told this was not correct, and I left it at that, but recently I figured I would investigate why it was wrong.  That is what this post is about.

The energy of a photon is given by:
E = h * ν

h is Planck's constant 6.626e-34 [J*s]
ν is frequency [Hz]

I looked up the energy carried by electromagnetic waves in my go-to book:  "Introduction to Electrodynamics" by David J. Griffiths:
S = c * εo * Eo * cos^2(k*z - ω*t + δ)

c is the speed of light
εo is the permittivity of space
Eo is the (rms? max?) magnitude of the electric field
cos^2 is cosine squared
k is the wavevector, defined by the relationship between frequency and the speed of light
ω is the frequency of the radiation
δ is the phase

We can simplify this by assuming z = 0,  δ = 0 - this just says we are looking at what happens at z = 0, and there is no phase offset..
S = c * εo * Eo * cos^2(ω*t)

This equation defines the energy per unit time, per unit area.  So for the above, we would choose as our unit of time one cycle, or one half cycle of the wave.  But that still leaves the problem of the area.  Also, there is no classical restriction on the magnitude of the electric field (Eo).  The question then becomes, for a given area, is it possible to have Eo be so low that the energy of the photon spans more than 1 cycle?  There is nothing in the equations to prevent this.  Is there experimental evidence of it?

This essentially comes down to "single photon" experiments - experiments in which photons are measured one at a time.  I start by reading the wikipedia entry on the double slit experiment:

and will post my thoughts of this and other reading separately.

Update:
Single photon experiments with red photons from a He-Ne laser are not too hard to do:

The separation between individual photons is 2 km, which is much longer than the wavelength of the radiation (~700 nm) therefore, given the above framework the photon would be spanning billions of nodes!!

Other related links:
lowest measured forces (and by extension, electric fields):

review of some single photon experiments:

single photon and complementarity:

proof of single photon existence - single photon hitting beam splitter, arriving at only one detector: