
The main motivation for doing this besides practice is that I am fairly sure we should be ordering degenerate oligos with more degeneracy than we have previously considered. I won't make that argument here, but just repeat some analytical graphs I'd previously made.
It took a while (since I’m learning), but was still much more straight-forward than doing it in a spreadsheet. The exercise was extremely useful, as I learned a bunch of stuff (especially about plots in R), while doing the following:
Problem #1: Given a percentage of degeneracy per base, d, in an n length oligo, what is the proportion of oligos with k mismatches?
Answer #1: Use the binomial distribution. For a 32mer with different levels of degeneracy (shown in legend):

Answer #2: Simply adjust each of the above values by dividing the number of classes within each of k mismatches (i.e. choose(n, k)):

Answer #3: Use the hypergeometric distribution. The below plot is as for Problem #1 for 0.12 degeneracy, but with the # of hits broken down for each k:

In #1 and #2, is it possible to have R draw theY axis going through zero? That would make values easier to estimate.
ReplyDeleteAnd in #3, what's the value of m? Am I right in thinking that the graph shows that 98% of oligos will have at least one mismatch to the consensus but only about 65% will have at least one of these in an important position?
Yeah. I figured out how to add lines using the "abline" function. For #2, I should probably be focused on only a part of the displayed graph too.
ReplyDeleteAs for #3, m=8, so 1/4 of the 32 positions are presumed to be important. And your approximations are about right...