Thursday, July 2, 2009

Mutation versus Transformation: Structural Variation

Here's an idea: Compare structural changes (indels and rearrangements) in transformed and untransformed cultures.

One major goal of our planned experiments is to measure the transformation rate across the genome. This is a fairly ambitious prospect, mainly because of the amount of sequencing it will require.

If we assume that the average donor allele transforms recipient chromosomes 1% of the time, then we would need to sequence the average locus 100 times to see the donor allele just one time. But to get a reliable measurement of transformation rate would require considerably more... perhaps 10,000 times. This would give us an average of 100 donor alleles / 10,000 alleles sequenced. Even using the Illumina platform would get fairly expensive to measure the transformation rate for every SNP.

One way we could do a preliminary sequencing experiment would be to ignore single-nucleotide differences between donor and recipient and focus solely on indel and rearrangement differences (a.k.a. structural variation). This data would speak to interesting hypotheses regarding the role of natural competence in maintaining the core genome and diversifying the accessory genome, and it would require considerably less sequencing.

So the goal of the preliminary experiment would be to measure structural variation due to mutation versus structural variation due to transformation.

How would this work, and why would it be cheap and easy?

First of all, the experiment-side is extremely easy. Naturally competent recipient cultures would be split in two. One part would be used to prepare untransformed recipient chromosomes; the other part would be incubated with donor DNA for a while, allowed to recover, and then transformed chromosomes would be purified. (We could also include a selection for a donor marker at this point to increase the relative transformation rate.) That’s it.

The sequencing side would also be comparatively easy. Untransformed and transformed chromosome preparations would be sheared, end-repaired, and size-fractionated by gel to 500 bp (as precisely as possible). This DNA would be ready for sequencing library construction and paired-end sequencing.

For a 500 bp library, I previously estimated ~2500X “spanning coverage” in one lane of Illumina sequencing using conservative estimates of the sequencing parameters. “Spanning coverage” is defined as how many times a particular genomic position is found between two mapped paired-end reads. So we’d get to 10,000X spanning coverage in ~1/2 a full run.

How would this help us measure the transformation of donor structural variants into the recipient?

Let’s use an example to illustrate. In the alignment shown above (using GenomeMatcher), the donor genome (86-028NP) is shown on top, while the recipient genome is shown on bottom (Kw20). As is pretty clear, genes Hi_0512 and Hi_0513 are absent from the donor genome, indicating a deletion of those genes from the donor (or possibly an insertion into the recipient). These genes happen to be the HindII restriction enzyme and methylase. The flanking genes are syntenic (so Hi_0511 = tchA, etc.).

Since we know the size-distribution of the library (500 bp), it would be quite simple to spot the deletion allele. Paired-end reads with one end in tchA and the other in rpoC would define the deletion. By contrast the insertion allele would always have tchA and rpoC mappings on different fragments (with the other ends in the HindII methylase or restriction enzyme).

Here’s a way of illustrating what paired-end reads of different kinds of alleles relative to the recipient would look like:
So for our deletion, we’d see paired-ends that mapped to positions further apart than they should be. By having extremely high spanning coverage, we could count how often these kinds of mappings occurred versus the recipient mappings. This would give us the rate of deletion.

In our untransformed library, if we saw the deletion allele, we’d be seeing mutation, while in the transformed library we’d be seeing mutation and/or transformation. Since we know the donor genome sequence, we can distinguish paired-end reads that look like the donor sequence versus de novo mutations that occurred when we grew out the cells. Better still, we could spot transformation-induced de novo mutations by comparing to the untransformed library.

Why else do the untransformed chromosomes at all? Well, structural mutations are likely to occur at a much higher rate than single-nucleotide mutations in many instances. Independent of our interest in natural transformation, doing the control experiment may reveal regions of the genome that are unstable, along with the mutation rate of different types of structural variation. This last part is non-trivial. If, for example, we see a particular deletion that occurred on 50% of the untransformed chromosomes we looked at, it could be that this is an extremely frequent mutation, but it could also have simply occurred early in the grow-out of the culture.

As a control for transformation rates, we don’t have to worry about that, but doing the untransformed control would make us confident that changes we saw were induced by transformation and not simply due to such kinds of mutation.

2 comments: