Friday, February 5, 2010


Whew! So another grant out-of-the way for now; another one almost done; and my own postdoc re-application in the works… A brief respite… Maybe it’s time to blog.

Last time (two months ago), I started showing some pictures from IGV showing our raw sequence data aligned to the Rd genome. Here, I’ll do yet another summary of our preliminary experiment, as we pitched it in our grant application, which will lead nicely into what I’d like to do next time… talk about alternatives to multiplexing DNA samples by barcoding…

So we’re studying transformational recombination in H. influenzae, where cells take up DNA from the media and incorporate it into their chromosomes. We think we have a decent model of the mechanism from studies in H. influenzae and other organisms:

(My figures here might have been a little degraded on their journey into the blog)
But until our little sequencing experiment, we could only infer the extent of transformational recombination of a chromosome based on the transformation and co-transformation frequencies of phenotypic markers. We’ve learned a lot just from obtaining four recombinant genotypes. Here’s what the experiment looked like in overview:
DNA from one isolate (NP NovR NalR) was incubated with competent cells of another (Rd), and transformants were selected. Two were NovR and two were NalR. We got sequence data from our collaborator for all four of these in a pool, two of them individually, and each parent (Rd and NP) individually. Here’s the figure we used to illustrate what our data looked like:
Hmm… probably that’s a low-resolution picture, but working from the bottom of the figure:
The lower panel shows the frequency of NP-specific SNP alleles across the Rd chromosome for the pool of four chromosomes. Blue dots at 25% indicate that 1 of 4 recombinants contained the donor-specific allele, while blue dots at 50% indicate that 2 of 4 recombinants did. The two red dots indicate the two selected markers (NovR and NalR), which as expected are at 50%.

In the upper panel, a zoomed view around the NovR-containing region is shown. The blue dots clearly define the donor DNA segments, but since there are overlapping donor segments, their appropriate assignment to different recombinants is unclear:
But because we also sequenced one of the NovR recombinants, the assignment of all the segments is made apparent. The green bars at the top of the figure show the donor DNA segments in Recombinant A, and so the donor segment spanning NovR in Recombinant B is unambiguously inferred.

Notably, there are several clustered donor segments in Recombinant A. This suggests that processes like mismatch repair may be disrupting larger original DNA fragments during recombination. For example in the upper panel of Figure 3 above, the area shown by the small purple circle appears to be a mismatch repair event around an insertional deletion difference between Rd and NP. Here is what that region looks like in IGV:
This IGV picture is showing our sequencing reads against the NP genome (the donor). The top track shows our Rd reads mapped to NP; the middle track show NP reads mapped to NP, and the bottom shows Recombinant A reads onto NP. I looked at the whole-genome alignment in this interval and found that the structural variation here is due to an insertional deletion: the alignment breaks and NP has 128 bp that doesn’t align with 52 bp of Rd.

Here is how I interpreted this event in the context of the larger NP donor segment:
Okay! Cool!

It’s going to take a while to fully parse this data, but more important is how we should go about collecting more. We certainly think we can increase our pool size, but as it is now, we can’t obtain “linkage” information from the pool. The obvious solution, barcoding individual DNA samples, presents monetary, technical, and computational problems. However, there may be another way…