Wednesday, August 27, 2014

Clustering vs. Linear Regression - which to use?

I've got some data and I'm trying to make an informed decision whether it is better described by a linear regression or a set of clusters.

My goal here is to compare linear regression and clustering for some cases that are obviously better for one of these or the other, using 2-dimensional data that is easy to visualize.  By comparing these two workhorse methods under these conditions I'm hoping to gain better understanding of each and of how to decide when to use one or the other.

The 3 data sets I used were:
  1. "obviously" better described by linear regression
  2. "obviously" better described by clustering
  3. in between the above 2 extremes
I used R.  The scripts I used are present in this repository:

I got the code for clustering from:

Before we get started:  my friend Phil Montgomery who kindly reviewed this post made a good suggestion that in general, when you have 2 models and you are trying to decide which one to use, you want to compare the statistical likelihood of each.  Usually this is done by comparing different values of parameters for a mathematical model, but it is worth investigating if it has been done for comparison of these two systems.

Sunday, August 10, 2014

Raspberry Pi RAID array

I largely followed these very helpful instructions:

also useful:

  • 2 1-TB western digital drives
    • USB hub with power supply for these
      • Edit: make sure it has enough power to power the drives! My initial one (pictured below) did not and I suspect it was drawing power from the pi causing it to crash
  • raspberry pi
    • USB power supply for this (separate from above)
  • 8 GB sandisk microSD card
    • for use in raspberry pi
    • microSD card reader
  • cables to connect all

  1. download and install raspbian wheezy on in the microSD card, using the microSD card reader attached to your PC
    • recommend using the torrent to download, it was very fast once it got going
  2. put the microSD card in the pi, attach a keyboard, monitor and ethernet cable attached to your router.  plug in the power to the pi
    • should see the linux boot sequence on your monitor
  3. configure the raspberry pi - here are the options I used:
    1. boot to command line
    2. change password of default user (username = pi)
    3. advanced option:  make sure SSH is enabled
  4. reboot the pi, log in to confirm everything seems to be in order
  5. (at this point I switched to using my laptop / ssh instead of the keyboard attached to the pi.  I disconnected the monitor)
  6. connect drives to USB hub, connect hub to pi
  7. I mounted each of the drives to make sure they were working, and copied the readme pdf off of one out of curiosity / to actual test a disk op
  8. (I partitioned and formatted each of the disks at this step, but realized later this was not necessary)
  9. became super user:  sudo
  10. install the tool to create / manage the RAID array:
    • apt-get instal mdadm
  11. create the array:
    • mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1
    • (note the double dashes for the options - the linked blog above only used single dashes.  Some man pages only use 1 dash, that's what I tried initially and it did not recognize the options)
  12. check the array:
    • cat /proc/mdstat
  13. partition the array
    • fdisk /dev/md0
  14. format the array
    • mkfs.ext4 /dev/md0p1
  15. mount & test the array
    1. mkdir temp
    2. mount /dev/md0p1 temp
    3. echo hello RAID > temp/test
    4. cat temp/test
  16. unmount the array
    1. umount temp
  17. declare victory, go home
I should point out the contents of /proc/mdstat initially indicated that the initial sync would take ~16 hours, but that with the array mounted, that slowed down to almost 4 days!  Based on reading this:

I unmounted the array, and then watched the sync speed climb up back to where it was before.

After the sync finishes I'll start copying files onto it...

Saturday, June 28, 2014

Matlab solar system trajectory simulation

I wrote some code to simulate the trajectory of an object through a grossly simplified version of the solar system.  It uses the ordinary differential equation solver that comes with Matlab, some Newtonian physics, only has the Sun, Earth and Moon, and only uses idealized, circular orbits for those.  The code is here in this github repository:

It's kind of fun to play with and someday I'd like to make an interactive web browser version of it, using perhaps this javascript library for numerical calculations:

Wednesday, May 21, 2014

Priceless description of thesis research

This applies more generally, unfortunately

Tuesday, May 6, 2014

Peanut butter porter - the bottling

I bottled the peanut butter porter today, I used ~5/8 cup of dry malt extract for the sugar, boiled in water ~5 minutes, added to the beer as I was transferring from the carboy to the bottling bucket.  The material left over in the carboy was an oily mess, and based on that and what I tasted, I'm hopeful that most / all of the oil was left behind and the beer in the bottles is not oily.  However, I tasted it and it didn't really taste like peanut butter!

Monday, April 28, 2014

Video about BARD

My co-workers made a video about the BARD project, it is short but a great explanation of why BARD is needed and what it does.  Please consider checking it out and if you like, lick the thumbs up button:

Sunday, April 6, 2014

Carob Porter - the conclusion

Had a bottle of the Carob Porter on 2014-03-02 - at room temperature - tasted pretty good.  Bottom of bottle was covered in sediment, but the poor into the glass was not cloudy.  It is a very dark beer.

I continued to drink it on an almost daily basis and found it quite enjoyable.