Wednesday, August 27, 2014

Clustering vs. Linear Regression - which to use?

I've got some data and I'm trying to make an informed decision whether it is better described by a linear regression or a set of clusters.

My goal here is to compare linear regression and clustering for some cases that are obviously better for one of these or the other, using 2-dimensional data that is easy to visualize.  By comparing these two workhorse methods under these conditions I'm hoping to gain better understanding of each and of how to decide when to use one or the other.

The 3 data sets I used were:
  1. "obviously" better described by linear regression
  2. "obviously" better described by clustering
  3. in between the above 2 extremes
I used R.  The scripts I used are present in this repository:

I got the code for clustering from:

Before we get started:  my friend Phil Montgomery who kindly reviewed this post made a good suggestion that in general, when you have 2 models and you are trying to decide which one to use, you want to compare the statistical likelihood of each.  Usually this is done by comparing different values of parameters for a mathematical model, but it is worth investigating if it has been done for comparison of these two systems.

Sunday, August 10, 2014

Raspberry Pi RAID array

I largely followed these very helpful instructions:

also useful:

  • 2 1-TB western digital drives
    • USB hub with power supply for these
      • Edit: make sure it has enough power to power the drives! My initial one (pictured below) did not and I suspect it was drawing power from the pi causing it to crash
  • raspberry pi
    • USB power supply for this (separate from above)
  • 8 GB sandisk microSD card
    • for use in raspberry pi
    • microSD card reader
  • cables to connect all