Saturday, December 22, 2012

Python code to simplify loading data into SciDB

Summary:  I've written some Python code that simplifies the loading of data from a csv file into SciDB. The programmer specifies for each column in the csv file whether it should be an attribute or a dimension in the SciDB array, and then the code loads it as a raw array, creates the the destination array based on the provided specifications and the data loaded into raw, and then transfers the data from the raw array to the destination array.

I've added the code to this GitHub repository under the directory ScidbLoader:

TODO
  1. when calculating dimension chunk sizes, need to scale by the number of attributes - currently assumes 1 attribute

MIT GPL

All code presented on this blog is Copyright (C) David L. Lahr in the year it was published and released under the MIT GPL:


Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in the
Software without restriction, including without limitation the rights to use, copy,
modify, merge, publish, distribute, sublicense, and/or sell copies of the Software,
and to permit persons to whom the Software is furnished to do so, subject to the
following conditions:

The above copyright notice and this permission notice shall be included in all copies
or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE
FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.


Sunday, December 16, 2012

Python code to simplify reading SciDB data

I've written some code in python that uses the SciDB python connector to access data in a more straightforward manner.  In summary, you submit a query and get back an iterator over the data.  It is currently very incomplete, and needs:

  1. Currently it just returns the attributes.  It needs to also return the values of the dimensions.   Done!
  2. Ability to reset the iterator - it can currently only be used once
The code is publicly available from this github repository:
https://github.com/dllahr/scidb_python_utils

Saturday, December 8, 2012

Size of a polymer from a random walk

Summary:  From statistical mechanics, the size of a polymer is generally estimated using the statistics of a random walk.  Here I investigate the assumption that the size of the polymer is proportional to the distance between the start and end points of a random walk as it is generally taught in statistical mechanics.

Review of random walk in 1 dimension

Start at the origin of the x-axis (x = 0).  At each step, there is a 50% chance of moving 1 unit to the right, 50% chance of moving 1 unit to the left.

Here are some examples of random walks:

For N steps, the probability of having ended up at position x is given by the binomial distribution:


(from the above page at Wolfram). The full width at half max of the above distribution is:
sqrt(# of steps)