Wednesday, August 11, 2010

How to spend some dough?

We have some money to spend (kindly provided by Genome BC) for identifying recombination tracts in individual transformants, where donor genomic DNA from the clinical isolate NP is used to transform Rd competent cells (NP-into-Rd transformations). This will help us answer some very basic questions about natural transformation: What are the numbers and sizes of recombination tracts in individual transformants? Are different parts of the genome equally transformable? Do mismatch repair and other degradative mechanisms limit transformation?

We already have a few clones worth of data (which is now mostly analyzed), but we need more clone sequences to do any worthwhile statistics. Actually, I've made nicer pictures and done some re-analysis for the poster I took to the Evolution meeting. I should post some of that sometime...

We’d originally had a cost-saving plan of sequencing overlapping pools of clones to obtain data from 64 clones. We even came up with a slicker plan (since sequence yields keep increasing), in which we'd get 128 clones worth of data, and 64 of these would be "de-convoluted":
But now we’ve discovered that our local genome center provides inexpensive indexing of DNA samples (i.e. barcoding). This means we can obtain 92 clones worth of data for only ~$15K. Wow!
So now, I need to collect the clones for sequencing and extract their DNA. The question is what clones? I will be sequencing 4 sets of 23 clones (plus a control), so might think of the data collection in that manner. (This number of indexed clones per pool will be sufficient to obtain ~50-100 fold median coverage per clone, given current estimated yields.) Three are two things to consider: (a) technical concerns, and (b) using some clones to look at different kinds of transformations.

This post will cover the basic technical concerns, and I’ll use the next post to discuss some alternative things we might do with some of this sequencing capacity.

To deal with potential criticism later, I should collect the NP-into-Rd transformants from three separate transformation experiments using three separate competent cell preps. This will allow me to make statements about how reproducible the distribution of recombination tracts between transformations is. Because each replicate will only have a couple dozen clones, this will likely not be sufficient for detecting subtle differences between the independent replicates, but will be sufficient in determining overall consistency of different transformations

Selection for transformants:
Since only a fraction of cells in a competent culture appear to be competent (as measured by co-transformation of unlinked marker loci), I will again select for either the NovR or NalR alleles of gyrB and gyA (respectively) that I’ve already added to the donor NP strain. This selection step ensures that every sequenced clone has at least one recombination tract, thereby selecting against clones derived from non-competent cells.

With indexing, it shouldn’t matter how I organize the clones submitted for sequencing, but this data can also be analyzed without paying attention to the index (for which I have my reasons), so it would make sense to organize a plate like this:

We’d originally decided 64 additional clones (atop the 4 we’ve got) would be sufficient for a basic picture. Now we can do 92 for much less money. Should we then do 92 NP->Rd selected transformants instead of 64, or should we use some of this new sequencing space to investigate new things?

Doing more clones would give us greater statistical power, but the more we do the more get will get diminishing returns. Since we estimate that ~60-70 clones will be sufficient for a nice first look, maybe we don’t gain much by doing 92 of the “same” thing and would be served better by using some of the extra space for other endeavours.

For example, if we reserved one set (23 clones) for one or more other DNA samples, we’d still get 69 clones worth of recombination tracts. What could we do with this extra space (keeping in mind that these should be transformations that can be done immediately)?

The most important factor to consider with each of these is whether the sample size (23 clones) would be sufficient to obtain useful data, despite having given up some statistical power for the primary set of 69 clones. (In my mind this means what we get should publishable as is alone or with the primary dataset, but could also mean several small pilot experiments for “preliminary data”.)

There are a whole lot of things we might do with this space… (stay tuned)

1 comment:

  1. If this one set of 92 clones was all we would be able to do, then getting as much publishable data as possible would be a good principle. But we'll have enough funds to do at least one more set of 92, so using some clones to get data that helps us make the most of the later sequencing may be a good investment.