Price Equation derivation and gripes

This page has been superceded by my thesis.

Biological evolution is the change over generations of the genetic composition of populations due to natural factors, typically including significant randomness. Describing this mathematically, and developing quantitative tools to predict what might evolve under which conditions, is a great challenge. One place to begin is by describing, in a nice way, a population’s change in genetic character from one generation to the next. By “a nice way”, we mean that we’d like to be able to attribute changes to the appropriate influences. What changes are due to random mutations creating new variations, for example, and what changes are due to natural selection winnowing out varieties which cannot survive in their environment?

We can make a crude measure of a population’s genetic composition by counting up how many organisms in the population have a certain gene of interest. We can express this amount as a percentage of the total population, saying, for example, “The frequency of gene A in this population is 0.22.”

In this section, we use the notation of van Veelen (2005), which unfortunately does not seem to be freely available online.

We consider two populations, $S_1$ and $S_2$. All the offspring of organisms in $S_1$ belong to $S_2$, and all the parents of organisms in $S_2$ are in $S_1$. We write $N$ for the size of population $S_1$. For an individual $i \in S_1$, the frequency of gene A is

$q_i = \frac{g_i}{l_z},$

where $l_z$ is the zygotic ploidy. The frequency of gene A in population $S_1$ is

$Q_1 = \frac{\sum_{i\in S_1} g_i}{l_z N} = \frac{\sum_i q_i}{N}.$

We want to relate $Q_2$ and $Q_1$. One simple way to do so is to take their difference:

$\Delta Q = Q_2 - Q_1.$

We can write $Q_2$ as

$Q_2 = \frac{\sum_i g_i'}{l_g\sum_i z_i},$

where $l_g$ is the gametic ploidy, $z_i$ is the number of successful gametes from individual $i$, and $g_i'$ is the number of A-type genes in the set of all successful gametes from individual $i$. The proportion of A-type genes in that set is

$q_i' = \frac{g_i'}{z_i l_g}.$

From this,

$Q_2 = \frac{\sum_i z_i l_g q_i'}{l_g \sum_i z_i} = \frac{\sum_i z_i q_i'}{\sum_i z_i}.$

Therefore,

$\Delta Q = \frac{\sum_i z_i q_i'}{\sum_i z_i} - \frac{1}{N}\sum_i q_i.$

We’d like our expression for the change in $Q$ to be written in terms of the changes in the individual $q_i$, so we subtract and add a sum over $q_i$:

$\Delta Q = \frac{\sum_i z_i (q_i' - q_i)}{\sum_i z_i} + \frac{\sum_i z_i q_i}{\sum_i z_i} - \frac{1}{N}\sum_i q_i.$

Next, we gather the last two terms over a common denominator:

$\Delta Q = \frac{\sum_i z_i (q_i' - q_i)}{\sum_i z_i} + \frac{\sum_i z_i q_i - \frac{1}{N}\sum_i q_i\sum_j z_j}{\sum_i z_i}.$

Now, we factor an $N$ out of the latter term.

$\Delta Q = \frac{\sum_i z_i (q_i' - q_i)}{\sum_i z_i} + \frac{N}{\sum_i z_i} \left[\frac{1}{N}\sum_i z_i q_i - \frac{1}{N^2}\sum_i q_i\sum_j z_j\right].$

We rearrange this just a bit to yield the **Price Equation**:

$\boxed{\Delta Q = \frac{N}{\sum_i z_i} \left[\frac{1}{N}\sum_i z_i q_i - \left(\frac{1}{N}\sum_i q_i\right)\left(\frac{1}{N}\sum_j z_j\right)\right]
+ \frac{\sum_i z_i (q_i' - q_i)}{\sum_i z_i}.}$

This is just an algebraic identity: we took the compositions of the two populations as given, and we wrote a fancy expression for the change of gene frequency between them. *We have not said anything about dynamics from which this change could be derived, nor have we made any claims about what changes are more probable than others.*

- Van Veelen
*et al.*(2012) make the point in the following way: - $[$W$]$hat is most important is that we realize that the numerical input of the Price equation is a list of numbers. It is a list that concerns two generations, and which tracks who is whose offspring. But whatever it reflects, it is crucial to realize that the point of departure is nothing but a list of numbers. This list of numbers is used twice. First we use it to compute the frequencies of the gene under consideration in generations 1 and 2, respectively, and subtract the latter from the former. This amounts to the change in gene frequency. Then we use the same list to compute a few other, slightly more complex quantities. The essence of the Price equation is that these quantities also add up to the change in gene frequency. One way of computing the change in frequency therefore can be rewritten as the other and vice versa. What they are, therefore, is nothing but two equivalent ways to compute the change in gene frequency, given a list of numbers concerning genes in two subsequent generations $[\ldots]$ Whether this particular second generation is likely to follow the first or not, the two ways of computing the change in frequency return the same number.

To make a physics analogy, what we have done is like starting with Newton’s second law, $\vec{F} = m\vec{a}$, and writing it as

$m\vec{a} = m\vec{a}.$

We could then rewrite the $\vec{a}$ vectors in some elaborate way. For example, we could write one side of the equation in Cartesian coordinates and the other in spherical coordinates, giving some complicated formulas involving trigonometric functions all over the place. These formulas would be *true,* in the sense that Euclidean geometry is *true,* but they would contain no *physics.* In some circumstances, they might be useful, but we could not wring value out of them without some extra assumptions about the dynamics at work.

*(leaving in telegraphic form for now)*

Donors of effort increase the number of successful gametes produced by the recipient, at the expense of their own. We parameterize this in the following way: denote by $c$ a donor’s decrease in successful gametes of its own, and denote by $b$ the increase in successful gametes of the recipient. We idealize interactions as pairwise events, and so we keep track of them using matrices. The first index, $i$, denotes an individual in population $S_1$. The second index, $\alpha$, ranges over the occasions on which interactions can take place.

$\Delta Q = \frac{N}{\sum_i z_i} \left[\frac{1}{N}\sum_i z_i q_i - \left(\frac{1}{N}\sum_i q_i\right)\left(\frac{1}{N}\sum_j z_j\right)\right]
+ \frac{\sum_i z_i (q_i' - q_i)}{\sum_i z_i}.$

We can be slightly more general and allow each individual to have their own ploidy, $l_i$. So, instead of using the population size $N$, we use $\sum_i l_i$. Following the literature, we calculate the number of successful gametes per haploid set, $w_i = z_i / l_i$.

$\Delta Q = \frac{\sum_i l_i}{\sum_i l_i w_i} \left[\frac{\sum_i l_i w_i q_i}{\sum_i l_i} - \left(\frac{\sum_i l_i w_i}{\sum_i l_i}\right)\left(\frac{\sum_i l_i q_i}{\sum l_i}\right)\right]
+ \frac{\sum_i l_i w_i (q_i' - q_i)}{\sum_i l_i w_i}.$

We now make two assumptions:

- The second term in this form of the Price identity is negligible.
- The fitnesses $z_i$ can be written$z_i = l_i w_i = f_i + b \sum_\alpha S_{i\alpha} - c\sum_\alpha Q_{i\alpha}.$
Here, $\sum_\alpha S_{i\alpha}$ is the total number of times individual $i$ received a benefit, and $\sum_\alpha Q_{i\alpha}$ is the number of times individual $i$ incurred a cost.

We introduce the abbreviation

$\bar{q} = \frac{\sum_i l_i q_i}{\sum_i l_i}.$

Dropping the last term of $\Delta Q$ and substituting in our chosen form for $l_i w_i$, we arrive after some algebra at the following:

$\Delta Q = \left(\frac{\sum_{i,\alpha} Q_{i\alpha}(q_i - \bar{q})}{\sum_i l_i w_i}\right) \left[\left(\frac{\sum_{i,\alpha} S_{i\alpha}(q_i - \bar{q})}{\sum_{i,\alpha} Q_{i\alpha}(q_i - \bar{q})}\right) b - c \right].$

The quantity in square brackets has the form of Hamilton's condition, if we identify the quotient multiplying $b$ as a measure of assortment:

$r = \frac{\sum_{i,\alpha} S_{i\alpha}(q_i - \bar{q})}{\sum_{i,\alpha} Q_{i\alpha}(q_i - \bar{q})}.$

- M. van Veelen (2005), “On the use of the Price equation”
*Journal of Theoretical Biology***237,**4: 412–26. PMID:15953618.

- B. Allen and C. E. Tarnita (2012), “Measures of success in a class of evolutionary models with fixed population size and structure”
*Journal of Mathematical Biology,*DOI:10.1007/s00285-012-0622-x.

- M. van Veelen, J. Garcia, M. W. Sabelis and M. Egas (2012), “Group selection and inclusive fitness are
*not*equivalent; the Price equation vs. models and statistics”*Journal of Theoretical Biology***299**: 64–80. PMID:21839750

- B. Allen, M. A. Nowak and E. O. Wilson (2013), “Limitations of inclusive fitness”
*Proceedings of the National Academy of Sciences*online before print.

category: blog