## Sunday, February 3, 2013

### Factorial Design of Experiments

This blog post is about how scientific experiments can be designed such that the system being tested does not have to be measured at every possible combination of variables, or if it is how second order effects between variables can be calculated.  I'll work through an example to illustrate the principle:  in this example the "system" being tested is brewing beer.  Considering that brewing a batch of beer takes at least 2 weeks, involves many hours of work, it could be very worthwhile to find a way to get the same information from fewer experiments.

For some examples of practical applications / examples of factorial design of experiments (and statistical design more generally) here are some papers that Joshua L. Hertz and I wrote when we were in Stephen Semancik's group as post-doc's at NIST:
Combinatorial Characterization of Chemiresistive Films Using Microhotplates
A Combinatorial Study of Thin-Film Process Variables Using Microhotplates

## Introduction

Assume that we are interested in the effect on the color of the beer of three variables:  ferment temperature, type of yeast used, and mash temperature.  We will test each of these variables at 2 different settings:
ferment temperature:  50 F, 45 F
type of yeast:  WP004, WP005
mash temperature:  145 F, 150 F

With 3 variables that have 2 settings each there are 8 possible combinations of experiments that can be run.

These three variables and their 2 settings each can be represented as a cube, where each dimension is a variable, each side has fixed value for one of the variables, and each vertex represents one of 8 possible combinations of the variables:

Each vertex corresponds to a combination of the 3 variables:

## Running all 8 experiments - full factorial design

If we run all 8 of these experiments it is called the full factorial design.  We would calculate the effect of a variable (e.g. yeast) by averaging the results of the experiments with one type of yeast (WLP005) and subtracting the average of the results with the other type of yeast (WLP004):
We can visualize each of these calculations by imagining that we average over of the results of experiments on one face of the cube, and then subtract the average of the experiments on the corresponding opposite face of the cube.  For example, for the calculation of the effect of yeast, we average over the measurements on the right face (in which all measurements use yeast WLP005) and then subtract the average over the left face (in which all measurements use yeast WLP004).

What if there is a 2nd order effect - a synergistic effect between 2 of the variables?  We can calculate this effect explicitly, and by doing so show how it is cancelled out in the calculation of the 1st order effect.

#### Calculating 2nd order effects from the full factorial design

The second order effect of yeast and ferment temperature is calculated by averaging the measurements when the yeast and temperature change are both in the "same direction", and then subtracting the average of measurements when the yeast and temperature change are in "opposite directions":

We can visualize this as subtracting the experiments that lie on 2 diagonal planes through the cube - in the above illustration the dashed lines represent the 2 diagonal planes that we use for the yeast-ferment temperature second order effect, and the color coding indicates how each plane corresponds to one of the averages above.

The first average in the above equation has the change of yeast from WLP004 to WLP005 correlated with a temperature change from 50 F to 55 F.  The second average has them anti-correlated:  WLP004 to WLP005 corresponds to temperature change from 55 F to 50 F.  How does this work?  Take a case hypothetical situation where yeast WLP005 is affected by ferment temperature, but yeast WLP004 is not.  The measurements of WLP004 in the above will all be the same - and thus cancel out.  That "leaves behind" the measurements with yeast WLP005, in the upper average they are both at 55 F, in the lower average they are both at 50 F.  According to our proposed situation these are different, and thus the above calculation correctly identifies this difference as the second order effect.

#### 2nd order effect is cancelled out in 1st order effect calculation

If we look at the cube representing the calculation of 2nd order effects, we see that the points used in one average of 2nd order effects appear equally in both averages of the 1st order effects - thus they are cancelled out in the calculation of 1st order effects.

## Running fewer experiments - fractional factorial design

Now consider a subset of the above experiments:

In the above we've chosen 4 of the 8 experiments.  We've "spaced" them out as far as possible from each other.  Note that this arrangement is very different than the standard used by many scientists or engineers:  typically we pick one of the corners as our best guess at a starting point, and then we test along each of the edges.  While also reducing the number of experiments, this method has problems that we'll discuss below.

If you are familiar with geometry or chemistry, one way to remember the above arrangement of points is to realize that the above points form a tetrahedron within the cube:

The "lower" legs of the structure are perpendicular to the "upper" legs, which is another way to visualize how these points do a good job of spanning the space.

We can calculate the effect of the variables using the above points as we did before:  for effect of yeast, average the 2 points using yeast WLP005, subtract the average of the 2 points using WLP004:

Note that again second order effects are cancelled out and we can also still measure 2nd order effects:

We can apply the same logic used in the full factorial experiment to show how this will measure the second order effect (left as an exercise to the reader, aka I'm feeling lazy).  Note however that if there is a third order effect involving mash temperature it will confound the above results!  The first average in the above equation has measurements that are both at 150 F - it does not contain an experiment at a mash temperature of 145 F.  Similarly, the second average contains only measurements at a mash temperature of 145 F.  Therefore the difference in mash temperature is not cancelled out in the above.

#### Old-school design: One variable at a time

Compare the above design to this choice of 4 measurements:

This is pretty typical of what we do in science and engineering:  for example we pick our best guess (WLP004, 50 F, 145 F), and then we vary each variable individually.  If we calculate the effect of each variable, it is just the difference between individual variables, for example:
effect of yeast = E(WLP005, 50 F, 145 F) - E(WLP004, 50 F, 145 F)

So instead of averaging 4 measurements or even 2 measurements we are relying just on 1 measurement at each variable setting.  The other problem is that we have no way of estimating 2nd order effects.  This can cause a real problem if for example we decided from the above that changing 2 of the variables independently have a a beneficial effect - in this situation we have no way of knowing what will happen when we change the variables simultaneously.  With the full factorial design we have already made the measurement, but have spent a lot of time making all the possible measurements.  With the fractional factorial design we have an estimate of all the second order effects, so if we have not made the measurement directly we can estimate what would happen when we change both variables simultaneously.