**Summary**: From statistical mechanics, the size of a polymer is generally estimated using the statistics of a random walk. Here I investigate the assumption that the size of the polymer is proportional to the distance between the start and end points of a random walk as it is generally taught in statistical mechanics.

#### Review of random walk in 1 dimension

Start at the origin of the x-axis (x = 0). At each step, there is a 50% chance of moving 1 unit to the right, 50% chance of moving 1 unit to the left.

Here are some examples of random walks:

For N steps, the probability of having ended up at position x is given by the binomial distribution:

Here are some examples of random walks:

For N steps, the probability of having ended up at position x is given by the binomial distribution:

(from the above page at Wolfram). The full width at half max of the above distribution is:

sqrt(# of steps)

sqrt(# of steps)

#### Statistical mechanics: size of polymer

Statistical mechanics describing the size of polymers starts with the basic assumption that you can model a polymer as a series rigid links that can have any relative orientation. It is simplified even further by assuming that it exists in only 1 dimension. Then, putting one end of the polymer at the origin, the position of the monomers in the polymer are modeled using a random walk in 1 dimension. One important conclusion from this analysis is that the average size of a polymer N units long will be proportional to the width of the binomial distribution (above):

sqrt(N)

#### Assumption / problem with the assumption

Modeling the polymer as a random walk means that one end of the polymer is at the origin, and the probability of finding the other end of the polymer at position x is given by the binomial distribution. Then, the binomial distribution of the end point of the polymer is used to calculate the average size of the polymer.

**This assumes that the spacing between the start and end points of the polymer are representative of the size of the polymer.**But for the random walk there are many cases where the distance between the start and end point of the walk is less than the greatest extent of the walk.Here are some examples of individual random walks, illustrating the full width of the walk or size of the polymer and comparing that to the distance between the start and end point of the walks:

For walk 1, note that the distance from the origin to the end is only 1, compared to the actual width of the walk / size of the polymer which is 5

For walk 2, the difference between the origin and the end point (9) is closer to the width (11) but still not the same.

#### Testing the assumption

In order to test the above assumption, I calculated the actual widths of polymers modeled by random walks (as defined above).

Interruption: I subsequently found a derivation that analytically derives that the min and max are proportional to the square root of the number of steps - problem solved! It is in this pdf. The result is at the end of example 12.3, within the section "Expected Number of Equalizations".

That aside, here is how I initially solved the problem: I wrote code to generate all of the possible random walks for some # of steps, and for each of these walks calculating the furthest negative position reached (min), the furthest positive position reached (max), and then calculated the distance between them to be the size of the polymer (width). Initially this was done for walks up to 22 steps in size - for a random walk with 22 steps there are 2^22 random walks = 8,388,608. Below is plotted the size of the polymer versus the number of steps:

This appears like it could be that the

average size = sqrt(# steps)

If this is true, then if I I plot the above data as average size vs. sqrt(# steps) it should be linear:

And it is!

Still linear! This calculation took just shy of 7 hours to run.

I used 4 runs which had these timing results:

I plotted the above, fit to an exponential, and then used the fit equation to estimate about how many steps could be run in 6 hours. The result was ~31, so I set the calculation to that number of steps and called it a night. Here is a graph of the timing results (blue diamonds), the exponential fit (black line), estimate of the time for 31 steps (dashed green line), and the actual run time for 31 steps:

Interruption: I subsequently found a derivation that analytically derives that the min and max are proportional to the square root of the number of steps - problem solved! It is in this pdf. The result is at the end of example 12.3, within the section "Expected Number of Equalizations".

That aside, here is how I initially solved the problem: I wrote code to generate all of the possible random walks for some # of steps, and for each of these walks calculating the furthest negative position reached (min), the furthest positive position reached (max), and then calculated the distance between them to be the size of the polymer (width). Initially this was done for walks up to 22 steps in size - for a random walk with 22 steps there are 2^22 random walks = 8,388,608. Below is plotted the size of the polymer versus the number of steps:

This appears like it could be that the

average size = sqrt(# steps)

If this is true, then if I I plot the above data as average size vs. sqrt(# steps) it should be linear:

And it is!

#### Notes on code design and more results

I initially ran the calculation by building a tree containing all the possible walks up to a certain # of steps (depth-first), but that was memory intensive. I reduced the memory requirement by doing the calculation for all walks of a certain # of steps at a time (breadth-first, results shown above), but this required storing the endpoints of all the walks of the previous # of steps which ran out of memory. I then switched back to a depth-first calculation, but did not store the tree in memory - now the constraint is on the amount of time I'm willing to wait for the results - I calculated the results out to 31 steps:Still linear! This calculation took just shy of 7 hours to run.

#### Engineering details and code

I used 3 methods to run this calculation

- depth-first, stored tree (didn't actually run calculation, just generated all random walks)
- breadth-first, stored previous walk endpoints
- depth-first, store statistics only

The last is the only way to achieve the higher number of steps. I never implemented calculating the results from the stored tree of the first method because the memory restriction was immediately too prohibitive to be useful. Code to be shared below, but first, here is how I estimated the run time for the calculation out to 31 steps.

I used 4 runs which had these timing results:

# steps | run time [s] |

15 | 0.5 |

20 | 13 |

21 | 28.26 |

22 | 54 |

I plotted the above, fit to an exponential, and then used the fit equation to estimate about how many steps could be run in 6 hours. The result was ~31, so I set the calculation to that number of steps and called it a night. Here is a graph of the timing results (blue diamonds), the exponential fit (black line), estimate of the time for 31 steps (dashed green line), and the actual run time for 31 steps:

The calculation took almost 7 hours to run, was over the estimated time by 24%. Interesting to note that in the logarithmic scale above, that 24% discrepancy is hardly noticeable. A good warning about looking at fits on logarithmic scales.

### Code

Written in python, run in python 2.7.

#### Common class

```
```

`class StepData:`

` pos = None`

` max_pos = None`

` min_pos = None`

` depth = None`

` def __init__(self, pos, prev_min_pos, prev_max_pos, depth):`

` self.pos = pos`

` self.max_pos = pos if pos > prev_max_pos else prev_max_pos`

` self.min_pos = pos if pos < prev_min_pos else prev_min_pos`

` self.depth = depth`

` def __repr__(self):`

` return "{}, {}, {}, {}".format(self.pos, self.min_pos, self.max_pos, self.depth)`

```
```

#### depth first, no storage

Recursive algorithm, does a running calculation of the statistics at for each number of steps in the calculation as it does a depth-first calculation.

```
```

```
```

`import step_data as sd`

`import math`

`class Stats:`

` sum = None`

` sum_sq = None`

` count = None`

` `

` def __init__(self):`

` self.sum = 0`

` self.sum_sq = 0`

` self.count = 0`

` `

` def add_value(self, value):`

` self.sum += value`

` self.sum_sq += value*value`

` self.count += 1`

`def add_step(prev_step_data, stats_list):`

` depth = prev_step_data.depth + 1`

` current_stats = stats_list[depth]`

` `

` next_pos_tuple = (prev_step_data.pos - 1, prev_step_data.pos + 1)`

` `

` for next_pos in next_pos_tuple:`

` step_data = sd.StepData(next_pos, prev_step_data.min_pos, `

` prev_step_data.max_pos, depth)`

`# print step_data`

` `

` width = step_data.max_pos - step_data.min_pos`

` current_stats.add_value(width)`

` `

` if (depth < max_depth):`

` add_step(step_data, stats_list)`

` `

`max_depth = 31`

`stats_list = [Stats() for i in range(0, max_depth+1)]`

`origin = sd.StepData(0, 0, 0, 0)`

`stats_list[0].add_value(0)`

`add_step(origin, stats_list)`

`for depth in range(1, len(stats_list)):`

` stat = stats_list[depth]`

` `

` ave = float(stat.sum) / float(stat.count)`

` sdev = math.sqrt((float(stat.sum_sq) / float(stat.count)) - (ave*ave))`

` `

` print depth, stat.count, ave, sdev`

```
```

```
```

#### breadth-first calculation, store only previous # of steps

```
```

`import step_data as sd`

`import math`

`prev_step_data_list = [sd.StepData(0,0,0,0)]`

`max_depth = 35`

`for depth in range(0,max_depth):`

` step_data_list = list()`

` for prev_step_data in prev_step_data_list:`

` next_pos_tuple = (prev_step_data.pos - 1, prev_step_data.pos + 1)`

` `

` for next_pos in next_pos_tuple:`

` next_step_data = sd.StepData(next_pos, prev_step_data.min_pos, `

` prev_step_data.max_pos, depth)`

` step_data_list.append(next_step_data)`

`# print next_step_data`

` sum_ = 0`

` sum_sq = 0 `

` for prev_step_data in step_data_list:`

` width = prev_step_data.max_pos - prev_step_data.min_pos`

` sum_ += width`

` sum_sq += width*width`

` N = len(step_data_list)`

` ave = float(sum_) / float(N)`

` sdev = math.sqrt((float(sum_sq) / float(N)) - (ave*ave))`

` print depth, N, ave, sdev`

`# print`

` prev_step_data_list = step_data_list`

```
```

#### depth-first calculation, store all random walks then print left most side of tree

```
```

`class TreeNode:`

` children = None`

` data = None`

` def __init__(self, data):`

` self.data = data`

` self.children = list()`

```
```

```
```

```
import tree_node as tn
import step_data as sd
```

max_depth = 30

def add_step(tree_node, prev_depth):
depth = prev_depth + 1

next_pos_tuple = (tree_node.data.pos - 1, tree_node.data.pos + 1)
for next_pos in next_pos_tuple:
next_step_data = sd.StepData(next_pos, tree_node.data.min_pos, tree_node.data.max_pos, depth)
next_tree_node = tn.TreeNode(next_step_data)

tree_node.children.append(next_tree_node)

if (depth < max_depth):
add_step(next_tree_node, depth)

origin = sd.StepData(0, 0, 0, 0)
root = tn.TreeNode(origin)

add_step(root, 0)

node = root

while (node is not None):
print node.data.pos, node.data.min_pos, node.data.max_pos, node.data.depth

node = node.children[0] if len(node.children) > 0 else None

#### Edit: Some further questions and answers

I do not know the origin of the phrase "random walks". I've done a quick search, and it appears to just happen to be how the problem was stated originally by Karl Pearson.

How do you calculate the random walk?

There are computer programs called "random number generators" that can provide you with what is essentially a random number every time you make a request. You might remember that I built you some excel spreadsheets once that generated math problems for your students - they used a random number generator within excel. I have used the same thing to generate the examples of the random walks on the blog post.

Is the point of departure or starting point a given?

Generally the starting point is a given and is fixed at x = 0, but that does not have to be the case. The other thing that is then specified (usually) is the number of steps taken in the walk.

Is the ending fixed, or is it stretched to infinity?

Generally the number of steps is fixed, but since the walk is random the end point is not fixed. Sometimes they will do the calculation for an infinite number of steps (to prove some other point). I can imagine also a situation where they would fix the endpoint and then ask the question "which or how many random walks out of all possible random walks will reach this endpoint?"

## No comments:

## Post a Comment