Thursday, May 7, 2009

Degenerate oligos II

(EDITED to add another graph)

In the first post on our planned degenerate oligo experiment, I described how we might expect the input degenerate oligo pools to look. I used the binomial probability to calculate the chance that a given oligo would have k incorrect bases in an oligo of length n.

So Pn(k) = n! / [ k! * (n-k)! ] * p^(n-k) * (1-p)^k, where p was the probability of the correct base being added at a particular position.

As a thought experiment, I now want to assume that only some of the bases in the n-mer matter to the uptake machinery. I’ll call this number m. The first difficulty was to figure out what the probability of hitting h of the m important bases, given k substitutions in a particular oligo.

This took me a while to figure out. I knew it had something to do with "sampling without replacement" and was able to work out the probabilities by hand for k = 0, 1, or 2, but then I started to get flummoxed. Eventually, I stumbled across the right equation to use: The Hypergeometric Distribution. (That’s seriously cool-sounding.)

Since I don’t know how to write a clean looking equation in HTML (see the above morass given for the binomial probability), I’ll again use the notation ( x y ) to mean "x choose y", or x! / [ y! * (x-y)! ] and will carefully add a * where I mean to indicate multiplication. The hypergeometric is then given as:

f(h; n, m, k) = ( m h ) * ( n-m k-h) / ( n k )

h = hits, or the # substitutions in the important bases
n = total length of the oligo
m = # of important bases in the oligo
= # of substitutions in the oligo.

So if we define only 1/4 of the bases as important to uptake, then for a 32-mer, m = 8. Then, for a 12% degenerate oligo pool, we get a histogram that looks like this:
If we say that every substitution that hits an important base causes a 10-fold decrease in uptake efficiency, but every hit in am unimportant base causes no change in uptake efficiency, we can then tell what our histogram would look like for both our input oligo pool (IN) and our periplasm-enriched oligo pool (OUT):On the left side of the graph, the output is higher than the input, while on the right side of the graph, the opposite is true. This is what we expect, since the more substitutions in the oligo, the more likely they'll hit important bases.


And here's a third graph addressing Rosie's comment, which shows the ratio of OUT/IN for four sets of conditions:
Here it is again on a log-plot:


  1. I can see that if hits decrease uptake by more than 10-fold, then the effect will be stronger. But what if more sites matter, but they have an effect of only, say, 5-fold?

    First, I'll think about what if the proportion of sites that matter is increased (keeping the 10-fold effect). That would increase the red and yellow parts of the bars in the upper figure, and decrease the blue parts. And that would increase the differences between the blue and green bars in the lower figure (the low-k bars would be higher, and the high-k bars would be lower).

    If we then think about reducing the effect, there might be an effect level that would put the green bars about back where they started.

  2. Indeed, if we increase the number of important bases to 16/32 and say each hit causes only a 2-fold decrease, it looks a lot like with only 8/32 important bases and 10-fold hits. See above newly added graph.

  3. That's lovely. Now we should get back to the proposal writing....