Friday, March 5, 2010

Multiplexing sans barcodes

Previously, I’d said I wanted to go over how we might obtain many recombinant genotypes by deep sequencing pools of recombinants, since our tiny genome is TOO EASILY SEQUENCED using modern methods, making the sequencing of individual clones inefficient. The challenge is then in assigning donor DNA segments in the pools to individual clones. To a first approximation, this isn’t really necessary, since one of our main motivations for sequencing recombinants is simply to determine whether the locations and endpoints of donor DNA segments are biased: i.e. whether there are recombination hotspots or whether certain types of donor-recipient differences are recalcitrant to recombination.

However, our preliminary data showed that donor segments were often clustered in individual recombinants, probably due to mismatch repair disrupting larger donor fragments during transformation. We were only able to pin this down, because we individually sequenced 2 of the 4 transformants that we’d pooled. To illustrate, here’s a zoom of the region containing one of our selected sites at gyrB. 2 of 4 clones carry the causal allele (the red dot). But the pool data indicates several additional segments:

Are they in different clones? The same clone? How do we disentangle, without sequencing individuals (as was done here; shown as sets of colored bars at the top)?
Several methods for handling pooled data exist. The one typically referred to is “barcoding” where samples are processed individually and have unique sequence codes added during library construction, so that individual sequence reads can be assigned to individual clones. This is powerful method, but extremely expensive and labor-intensive. It surely has useful contexts, but for our purposes, we don’t really need to assign every read to every clone… only donor segments.

An alternate approach, outlined below, would simply ensure that any given clone appears in two different otherwise non-overlapping pools. In its simplest form this would simply be to pool by rows and also by columns (other more involved ways are here and here). I recently did a transformation experiment, where afterwards I grew up independent transformants in 64 wells of a 96-well culture plate.

They were arrayed in a checkerboard grid… 8X8 clones (yellow = NalR, and blue=NovR). If I prep DNA from all these clones, I could then produce Row Pools 1-8 and Column Pools A-H and each would have four clones of each resistant type. One issue would be distinguishing which endpoints belong together when segments are overlapping; another issue would be deciding which segments belong in the same clone.

If a donor segment appeared in clone 3C, for example, and it had unique endpoints (i.e. that donor segment is present only in clone 3C), then we would see those unique endpoints solely in pool 3 and pool C.

So we would have no difficulty assigning the segment to clone 3C.

On the other hand, if the segment was NOT unique, but present in, say clones 3C and 7E, we’d be unable to assign the segment to a particular clone due to "ghost" signals, but would instead know that there were two identical segments, but either in 3C and 7E, or in 3E and 7C.

(We’d be able to do this, since we’d still know the frequency of the segment in the different pools.)

So this is a good plan. We could first sequence by rows, giving us 64 more clones worth of data. And as long as there aren’t a whole bunch of identical endpoints for independent donor segments, we could then sequence pooled columns to assign segments to clones. If there were tons of identical endpoints, this would be such a shocking result, we’d need to re-think our next step anyways…

No comments:

Post a Comment