Tuesday, March 16, 2010

Chromosome "Position Effect"

I got some data for the transformation frequency at five different markers. They varied. This isn’t anything ground-breaking; reports of a "position effect" for transformation go back decades and a couple of recent studies in other organisms bear it out. The underlying cause of variation in transformation rate at different positions likely stem from two sources: the physical structure of the chromosome and the sequence composition of the recombination substrates. The former case is reasonably well-worked out for analogous processes in eukaryotes: For example, in yeast, heterochromatic regions are recalcitrant to recombination, but these sites become recombinogenic in mutants with defective heterochromatin assembly. Sequence composition has also been shown to affect the efficiency of recombinational strand exchange in several different contexts, both genetic and biochemical. This latter type of variation is not traditionally considered a "position effect", but is difficult to distinguish from the former.

Anyways, I wanted preliminary data showing that I can, in fact, detect differences in transformation at different genomic positions, since a big part of my proposed work will involve measuring to very high resolution this position effect...

I used MAP7 donor DNA to transform three independent Rd competent cell preps. MAP7 is highly similar to Rd, except that it carries several point mutations that confer antibiotic resistance. There are likely other unselected differences between Rd and MAP7, but few. Thus, differences between marker transformation rates are likely to predominantly reflect chromosome position effects, rather than sequence divergence between donor and recipient.

This latter point isn’t strictly true: in order to see transformation, a genetic change has to be made, and the selected MAP7 point mutations are genetic differences. But because in our preliminary sequencing data, we saw long stretches of donor-specific DNA with dozens to hundreds of SNPs, I don’t think these single-nucleotide differences are contributing too hugely to the observed variation in transformation rate.

Here’s the data for the five markers individually. Vertical bars indicate the mean transformation frequency per viable cell to the indicated antibiotic resistance allele. The inset circle shows a rough map of the location of the MAP7 markers. (Sorry about the lack of an origin.)
Indeed, I see a ~5-fold range of transformation frequencies, from ~1/500 to ~1/100. Since this is only an arbitrary sampling of five sites, the range of variation across the chromosome could be much higher.

As previously discussed, these values underestimate the transformation frequency per competent cell. Competent cultures typically have both competent and non-competent cells, and the “fraction competence” is typically measured by looking at co-transformation frequencies. These are often higher than expected, even for unlinked markers, a phenomenon termed “congression” and interpreted as a binary distinction between competent and non-competent cells in the culture.

The technical value of this is that I can elevate the observed transformation frequency at one locus by selecting for transformation at another unlinked locus, since this eliminates all non-competent cells from the culture, providing potentially higher sensitivity on our proposed sequencing experiments. It also dampens differences in culture-to-culture variation caused by big differences in fraction competence (not shown).

However, there is at least one old report using Bacillus that suggests congression does not simply reflect a binary distinction between competent and non-competent cells . If cells only came in those two flavors, we would predict that any pair of unlinked markers would show the same level of congression, yet they report that different pairs of markers had different congression frequencies (aside from due to linkage). They go on to suggest an interesting model for their observations, but my concern is more technical:

Does selecting for transformation at different loci affect the tranformation rate at a second unlinked locus?

If the answer is yes, then selection for transformants at a locus would be a poor way to elevate the transformation rate at other unlinked loci, since it would be biased in an unknown way. I also measured co-transformation of Nal resistance and each of the other four. Nal is “unlinked” from all the others (i.e. DNA fragments from standard DNA preps will always be too short to contain the NalR allele with another antibiotic resistance allele), so I can measure “congression” four times.

Here is the data. So, for example, the first bar was calculated as: f(kanR nalR) / f(kanR). This normalizes each bar to the nalR rate (i.e. “the frequency of nalR among kanR transformants”).
The first thing to note is that the scale bar has changed relative to the transformation/cfu. For each of the 3 cultures, there was ~3.5 fold increase in the observed transformation rate, which would be expected if ~1/3 of cells in each culture were competent.

The second thing to note is that selecting for any of the four markers had no effect on the NalR transformation frequency. So the answer to the above question is no. Phew! The Bacillus result was cool, but I’m glad it isn’t the case here. A binary competent/non-competent model is perfectly reasonable in our system (though this does not exclude the possibility of variation among competent cells). With this in hand, I can now plot the co-transformation data with respect to NalR. If selecting for NalR only eliminates non-competent cells but does not change the underlying transformation frequencies per competent cell at the other unlinked markers, then life is good.

Here’s the data. So for example, the first bar was calculated as f(kanR nalR) / f(nalR). This normalizes each bar to its own rate (i.e. “the frequency of kanR among nalR transformants”). For the nalR/competent cells, I used the average of all 12 points in the previous plot.
This data closely resembles that of the first figure, except all the values are ~3.5 -fold higher.

Woo! Next I should probably repeat congression data for linked markers, and repeat experiments with more divergent donor DNA.
(continued...)

Sunday, March 14, 2010

Repression of competence induction by purines

My illustrious colleagues have been re-examining some of the lab’s old data regarding the repression of competence by purines. The work has been slowly ongoing for years, and it may be close to being a complete story. I want to try and express what I think their model is and what seem to be its predictions, so they can tell me whether my understanding is straight…

First, a schematic depiction of how I interpret what we already know the induction of the competence regulon:
What we knew: Transferring cells growing in rich medium to competence medium induces 15 operons driven from the novel CRP-S promoter, and then cells become naturally transformable. Cells in competence medium have elevated cyclic AMP levels, directing the CRP protein to induce expression of genes with canonical CRP-N promoters, including the sxy gene. Sxy protein alters the binding specificity of CRP to also bind at CRP-S promoters, thereby inducing competence gene expression.

But Sxy levels are also regulated at translation, in addition to at transcription. The wild-type sxy mRNA transcript contains a stem-loop structure that inhibits its translation. Mutations that disrupt the stem-loop structure in the 5’-UTR are hypercompetent (e.g. the sxy-1 mutation). In wild-type cells, unknown factor(s) disrupt the stem-loop to induce the translation of sxy transcript in competence medium.

Now a schematic depiction of how I interpret the model for purine repression of competence:
The observation: Addition of purines to competence medium represses competence. Purine biosynthesis is repressed by the PurR protein when cellular pools of purines are high. Deletion of the purR gene reduces competence (presumably indirectly, by increasing cellular pools of purine), but mutations disrupting the sxy 5’UTR’s stem-loop suppress the purR mutant defect (Rosie’s last post).

A hypothesis: The sxy transcript stem-loop is stabilized in the presence of purines (either directly or indirectly), blocking the production of Sxy protein and thus the activation of the competence regulon. When purine pools are depleted, the stem-loop is disrupted. This predicts that addition of purines and purR mutations will inhibit sxy translation more than sxy transcription.

A corollary hypothesis: Purines block DNA translocation by PurR-dependent repression of the rec-2 gene, whose promoter contains a putative PurR binding site. A potential test of this hypothesis would be to treat sxy-1 competent cultures with purines. We would predict that if PurR directly represses rec-2, DNA translocation would be inhibited (but DNA uptake would not). Obviously, checking rec-2 transcription relative to other competence genes would make sense here as well, but the functional test would be most compelling.

Is that the basic notion? I know there’s a bunch of other experiments that have been done that I need to find out about…
(continued...)

Thursday, March 11, 2010

What have I got?

Okay, grant planning part 2... Below is a dense description of the preliminary data I have/will have for writing this next grant...

PRELIMINARY DATA

Transformation frequency depends on chromosome position:
DNA from a multiply marked derivative of Rd (MAP7) was briefly incubated with competent Rd cultures. The resulting transformation frequency at each of four loci was evaluated by selecting for cells that acquired the corresponding MAP7-specific antibiotic resistance allele. MAP7 DNA transformed each Rd locus at a different frequency. (Repeat experiment in progress… stay tuned but looks good.)

Sequence divergence decreases transformation frequency:
DNA from an antibiotic-resistant derivative of NP (1350NN) transformed Rd competent cells less efficiently than did DNA from MAP7, and vice versa. NP differs from Rd by ~2.4% per alignable base position (and an additional 10% of each genome is absent from the other, contained in indel polymorphisms) while the transformation frequencies at two loci were affected ~2 to 4-fold. (Data in hand.)

Co-transformation frequencies are non-random due to congression and linkage:
(Wish I didn’t have to describe this, but it’s too fundamental. Data mostly in hand.)

Transformants acquire hundreds of donor-specific alleles:
Several large DNA fragments recombined into the chromosomes of four individual Rd competent cells, as revealed by genome sequencing. Each of the four transformants was selected for resistance to one of two antibiotics encoded in the 1350NN strain (two NalR and two NovR), and the corresponding donor-specific allele was present in each of the four. In all, 24 donor segments (contiguous stretches of donor-specific alleles) were found across the 4 transformants, with an average of 1.4% of each recipient chromosome replaced with donor DNA (~25 kb and ~600 SNPs each). Mismatch repair is likely responsible for the disruption of contiguous stretches of donor-specific alleles in the transformants; assuming that for closely adjoined segments this was true, a total of 10 (instead of 24) independent transformation events occurred across the four transformants (6 of which were unselected; notably two of these were overlapping in independent transformants). (Data in hand; re-analysis in progress.)

DNA uptake signal sequences (USS) are densely distributed in the two genomes:
Both the Rd and NP chromosomes contain USSs nearly every kilobase and most are syntenic. (Cursory data only. Need a better analysis.)

Sequence preferences in DNA uptake can be captured by periplasmic DNA purification:

DNA fragments containing uptake signal sequences are efficiently taken up into cells, and taken up fragments can be cleanly purified away from both free DNA and chromosomal DNA. The use of rec-2 and rec-1 mutations will facilitate separating sequence biases at different stages of natural transformation. (Data in hand, except rec-1.)

LIST OF FIGURES:
  • Four/five marker transformation rates
  • Rd vs NP transformation rates
  • SNP spacing histogram with embedded SV table
  • Genome sequencing figure (pool data)
  • USS analysis
  • Molecular biology figure (uptake data)

(continued...)

Tuesday, March 9, 2010

Another day, another attempt to get a dollar


Sigh... another grant due soon; this time, it's my last attempt to get an NIH postdoctoral fellowship. My last reviews mainly took issue with my proposal, which they found to be overly ambitious and somewhat unfocused. So below, is my first attempt at a summary/specific aims page, followed by a couple of preliminary data collection things I'd like to do before it's due (on April 8)...


The introduction:
Naturally competent bacteria take up intact DNA from their surroundings and can incorporate it into their chromosomes by homologous recombination. Akin to sexual recombination in eukaryotes, this natural transformation pathway moves alleles and genes between otherwise clonal lineages; and human bacterial pathogens have used this pathway to share antibiotic resistance genes, antigenic determinants, and virulence factors. To better elucidate the mechanism of transformation and to inform population/epidemiological studies, the proposed work will use the opportunistic Gram-negative bacterium Haemophilus influenzae to disentangle the sequence biases intrinsic to the DNA uptake and DNA recombination mechanisms by combining classical microbiology with modern DNA sequencing.

The specific aims:
  1. Define the genetic consequences of natural competence to H. influenzae. Transformation frequencies vary for different sequences and at different chromosomal locations, and this could strongly influence the rate of sequence evolution and adaptation along the genome. I will transform competent cultures of the standard lab strain with the genomic DNA of a clinical isolate and use deep sequencing to measure transformation across the lab strain’s chromosome for all the ~40,000 sites differing in the clinical isolate. This will provide an unparalleled dataset for investigating the sequence factors that promote and limit genetic exchange between bacterial cells.
  2. Measure the contribution of DNA uptake specificity to natural transformation. In several human pathogens, including H. influenzae, the uptake machinery prefers DNA fragments containing short “uptake sequences”, and abundant sequence motifs in many bacterial chromosomes suggest that biased DNA uptake has had a profound influence on genome evolution. I will purify the intact DNA molecules taken up into the periplasm and cytosol of competent cultures and use deep sequencing to measure the sequence biases of the uptake machinery. In combination with (a), this will disentangle the contributions of DNA uptake from those of DNA recombination during natural transformation.
The platitudes: The proposed work will link molecular studies of transformation to the growing genome sequence data being collected from many isolates of many bacterial species. By establishing my approach with completely sequenced chromosomes and using a well-defined experimental system, later studies could include a greater diversity of sequences or mimic more and more natural conditions. As a directly applicable outcome, the work will also produce the beginnings of a new type of genetic resource for mapping traits that differ between natural bacterial isolates (as in eukaryotic quantitative genetics) by generating fully genotyped recombinants. In the future, such studies will give empirical underpinnings to population genomic studies of bacterial genetic exchange, as well as provide new testable hypotheses for investigating the molecular mechanism of transformation.

Preliminary data I would like: (besides what I’ve got)
  • Properly replicated transformation frequencies for several markers. (I did this before, but it hasn’t been properly replicated.)
  • Follow a few molecules through uptake and recombination?
  • Population genetic inferences of “recombination” in H. influenzae? (I did this before, but it sucked.)

(continued...)

Friday, March 5, 2010

Multiplexing sans barcodes

Previously, I’d said I wanted to go over how we might obtain many recombinant genotypes by deep sequencing pools of recombinants, since our tiny genome is TOO EASILY SEQUENCED using modern methods, making the sequencing of individual clones inefficient. The challenge is then in assigning donor DNA segments in the pools to individual clones. To a first approximation, this isn’t really necessary, since one of our main motivations for sequencing recombinants is simply to determine whether the locations and endpoints of donor DNA segments are biased: i.e. whether there are recombination hotspots or whether certain types of donor-recipient differences are recalcitrant to recombination.

However, our preliminary data showed that donor segments were often clustered in individual recombinants, probably due to mismatch repair disrupting larger donor fragments during transformation. We were only able to pin this down, because we individually sequenced 2 of the 4 transformants that we’d pooled. To illustrate, here’s a zoom of the region containing one of our selected sites at gyrB. 2 of 4 clones carry the causal allele (the red dot). But the pool data indicates several additional segments:


Are they in different clones? The same clone? How do we disentangle, without sequencing individuals (as was done here; shown as sets of colored bars at the top)?
Several methods for handling pooled data exist. The one typically referred to is “barcoding” where samples are processed individually and have unique sequence codes added during library construction, so that individual sequence reads can be assigned to individual clones. This is powerful method, but extremely expensive and labor-intensive. It surely has useful contexts, but for our purposes, we don’t really need to assign every read to every clone… only donor segments.

An alternate approach, outlined below, would simply ensure that any given clone appears in two different otherwise non-overlapping pools. In its simplest form this would simply be to pool by rows and also by columns (other more involved ways are here and here). I recently did a transformation experiment, where afterwards I grew up independent transformants in 64 wells of a 96-well culture plate.

They were arrayed in a checkerboard grid… 8X8 clones (yellow = NalR, and blue=NovR). If I prep DNA from all these clones, I could then produce Row Pools 1-8 and Column Pools A-H and each would have four clones of each resistant type. One issue would be distinguishing which endpoints belong together when segments are overlapping; another issue would be deciding which segments belong in the same clone.

If a donor segment appeared in clone 3C, for example, and it had unique endpoints (i.e. that donor segment is present only in clone 3C), then we would see those unique endpoints solely in pool 3 and pool C.

So we would have no difficulty assigning the segment to clone 3C.

On the other hand, if the segment was NOT unique, but present in, say clones 3C and 7E, we’d be unable to assign the segment to a particular clone due to "ghost" signals, but would instead know that there were two identical segments, but either in 3C and 7E, or in 3E and 7C.



(We’d be able to do this, since we’d still know the frequency of the segment in the different pools.)

So this is a good plan. We could first sequence by rows, giving us 64 more clones worth of data. And as long as there aren’t a whole bunch of identical endpoints for independent donor segments, we could then sequence pooled columns to assign segments to clones. If there were tons of identical endpoints, this would be such a shocking result, we’d need to re-think our next step anyways…
(continued...)