For some examples of practical applications / examples of factorial design of experiments (and statistical design more generally) here are some papers that Joshua L. Hertz and I wrote when we were in Stephen Semancik's group as post-doc's at NIST:
Combinatorial Characterization of Chemiresistive Films Using Microhotplates
A Combinatorial Study of Thin-Film Process Variables Using Microhotplates
Introduction
Assume that we are interested in the effect on the color of the beer of three variables: ferment temperature, type of yeast used, and mash temperature. We will test each of these variables at 2 different settings:ferment temperature: 50 F, 45 F
type of yeast: WP004, WP005
mash temperature: 145 F, 150 F
With 3 variables that have 2 settings each there are 8 possible combinations of experiments that can be run.
These three variables and their 2 settings each can be represented as a cube, where each dimension is a variable, each side has fixed value for one of the variables, and each vertex represents one of 8 possible combinations of the variables:
Each vertex corresponds to a combination of the 3 variables:
Running all 8 experiments - full factorial design
If we run all 8 of these experiments it is called the full factorial design. We would calculate the effect of a variable (e.g. yeast) by averaging the results of the experiments with one type of yeast (WLP005) and subtracting the average of the results with the other type of yeast (WLP004):We can visualize each of these calculations by imagining that we average over of the results of experiments on one face of the cube, and then subtract the average of the experiments on the corresponding opposite face of the cube. For example, for the calculation of the effect of yeast, we average over the measurements on the right face (in which all measurements use yeast WLP005) and then subtract the average over the left face (in which all measurements use yeast WLP004).
What if there is a 2nd order effect - a synergistic effect between 2 of the variables? We can calculate this effect explicitly, and by doing so show how it is cancelled out in the calculation of the 1st order effect.
Calculating 2nd order effects from the full factorial design
The second order effect of yeast and ferment temperature is calculated by averaging the measurements when the yeast and temperature change are both in the "same direction", and then subtracting the average of measurements when the yeast and temperature change are in "opposite directions":We can visualize this as subtracting the experiments that lie on 2 diagonal planes through the cube - in the above illustration the dashed lines represent the 2 diagonal planes that we use for the yeast-ferment temperature second order effect, and the color coding indicates how each plane corresponds to one of the averages above.
2nd order effect is cancelled out in 1st order effect calculation
If we look at the cube representing the calculation of 2nd order effects, we see that the points used in one average of 2nd order effects appear equally in both averages of the 1st order effects - thus they are cancelled out in the calculation of 1st order effects.
Running fewer experiments - fractional factorial design
Now consider a subset of the above experiments:In the above we've chosen 4 of the 8 experiments. We've "spaced" them out as far as possible from each other. Note that this arrangement is very different than the standard used by many scientists or engineers: typically we pick one of the corners as our best guess at a starting point, and then we test along each of the edges. While also reducing the number of experiments, this method has problems that we'll discuss below.
If you are familiar with geometry or chemistry, one way to remember the above arrangement of points is to realize that the above points form a tetrahedron within the cube:
The "lower" legs of the structure are perpendicular to the "upper" legs, which is another way to visualize how these points do a good job of spanning the space.
We can calculate the effect of the variables using the above points as we did before: for effect of yeast, average the 2 points using yeast WLP005, subtract the average of the 2 points using WLP004:
Note that again second order effects are cancelled out and we can also still measure 2nd order effects:
Old-school design: One variable at a time
Compare the above design to this choice of 4 measurements:This is pretty typical of what we do in science and engineering: for example we pick our best guess (WLP004, 50 F, 145 F), and then we vary each variable individually. If we calculate the effect of each variable, it is just the difference between individual variables, for example:
effect of yeast = E(WLP005, 50 F, 145 F) - E(WLP004, 50 F, 145 F)
So instead of averaging 4 measurements or even 2 measurements we are relying just on 1 measurement at each variable setting. The other problem is that we have no way of estimating 2nd order effects. This can cause a real problem if for example we decided from the above that changing 2 of the variables independently have a a beneficial effect - in this situation we have no way of knowing what will happen when we change the variables simultaneously. With the full factorial design we have already made the measurement, but have spent a lot of time making all the possible measurements. With the fractional factorial design we have an estimate of all the second order effects, so if we have not made the measurement directly we can estimate what would happen when we change both variables simultaneously.
No comments:
Post a Comment