Friday, May 29, 2009
Enumerating their differences
I took a break from working on GBrowse today to try and do some genome-wide alignments of the sequenced Haemophilus influenzae isolates, and it's gone remarkably smoothly. The more I play around with the command-line, the more I'm finding it an excellent and efficient way to compute. Still hurts my head after a while.
I went to GenBank and downloaded FASTA and GenBank files for the four completely sequenced strains: our reference KW20-Rd and three clinical isolates found in ear infections...
I first used a program called MAUVE, recommended by my predecessor postdoc, which produced the lovely looking colorful plot above, comparing KW20-Rd and 86-028NP. It was extremely easy to use (though I only tried default settings). The only thing I had to do was change the GenBank file extensions from .gb to .gbk and everything worked great.
The colored blocks indicate syntenic regions and the relative sequence similarity within a block is indicated on the y-axis. When a block is in an inverted orientation, it falls below the line. When there's white-space, that's indicating a large insertion absent from the other strain. I can sort of piece together the rearrangements separating the two strains by eye, but even easier is a button in the MAUVE window that took me to a GRIMM analysis page that simply gave me a minimal path of inversions between the sequences. Sweet! It counts six big inversions (some overlapping) to give the rearrangements needed to get from KW20-Rd to 86-028NP.
My next task was to try and actually enumerate the differences between the strains. MAUVE makes a pretty picture (and also has a nice browser-like visual annotation based on the GenBank file), but I'd also simply like to count all the SNPs, indels, and other rearrangment breakpoints between the strains. I couldn't figure out how to do this in MAUVE and didn't feel like trying to parse its alignment file.
So I got an altogether different genome-wide alignment suite of programs called MUMmer. At first, I was scared of it, since it involved command-line arguments, and I'm sort of fried on that for the week. But after installing and compiling the suite of programs, I found it incredibly easy to use. For this I used FASTA files, and invoked the dnadiff program, which uses a bunch of the MUMmer utilities for pairwise comparisons. It did each pairwise analysis in under 10 seconds, providing me with a series of different output files. I was impressed.
I spent most of the day learning what these files contained and playing with the settings, and it's pretty much got everything there I want, as long as I can parse the files correctly. I decided to start with the SNPs between the reference KW20-Rd genome and the other three. There as one little glitch in the nice .report output file, in which it didn't call all classes of SNPs (it was excluding G-T differences from the report), so I dug into the code and successfully altered it to give the full list. Here's what I found:
There's >40K SNPs between the reference and each of these others. As would be expected, the transition/transversion ratios are ~2. The number of SNPs going from X->Y and Y->X were effectively identical for all comparisons, so I added them together in the above plot. It also looks like toggling between G and C is a particularly difficult transversion for some reason. Anyways, piles of SNPs, as expected.
For my next trick, I'll look at the distribution of indel and rearrangement polymorphism... (continued...)
Thursday, May 28, 2009
GBrowse Update (boring, except the bit about the fish)
I finally succeeded in my attempt to put a functioning GBrowse installation on my computer. My problem turned out to be extremely trivial and way upstream of the actual GBrowse installation. Now I’ve got to figure out how to correctly produce and configure a basic Haemophilus influenzae genome database...
My first big problem--now solved--was that I was unable to view webpages from localhost, that is I couldn’t view webpages served by Apache2 on my own computer. All kinds of other people have had such problems, but none of the help I found on forums seemed to help with my problem. Nevertheless, I did learn a lot more about the way that files are organized in Mac’s UNIX.
Luckily, in my forum perusal, I stumbled across a nice switch to the apachectl function, so that at the command line I could type: "sudo apachectl -t" or "sudo apachectl configtest". This gave me a syntax check of my Apache webserver configuration, which returned a convoluted syntax error. After checking the several files where the errors were called, I figured out that I’d failed to add a space between two separate statements in the configuration file I'd made that set my own permissions. Gah. Anyways, it works now. The problem WAS actually covered in several of the forums I'd searched (because Apple moved some things around in their latest Leopard upgrade), but that's no help when you type the needed file in wrong...
After that I had one additional problem, which was that I used all the default settings with the GBrowse install script, which put everything into the wrong or non-existent directories. When I repeated the install with the correct paths for Mac OS X 10.5.7 and Apache2, I suddenly had the Generic Genome Browser on my computer!
I found this site, which told me to redirect the installer to the following paths when prompted:
I went to http://localhost/gbrowse/, and got a page annoucing:
Okay, nonetheless, I still haven’t gotten the Haemophilus influenzae KW20 genome properly working in the browser. I’ve gone through the tutorial pretty thoroughly and have correctly configured their tutorial Volvox database using a MySQL backend. It works fine.
My KW20 database seems to be correctly imported into MySQL, but nothing shows up on the webpage. I originally thought it had something to do with my configuration file, but now I suspect some kind of import defect. I’ve tried it two ways.
(1)Doing it in memory: Getting the GenBank file from NCBI, converting it with BioPerl’s conversion program, bp_genbank2gff3.pl, then loading it with BioPerl’s bp_bulk_load_gff.pl
(1) Doing it in MySQL with the GFF and FASTA files from TIGR’s homepage and using BioPerl’s MYSQL dumper, bp_seqfeature_load.pl (for which I can find no good link or man page).
Neither of these worked. I’ve been tweaking the configuration file and trying to reload the database in several ways. So far with no luck. But progress! I'm fairly certain I need to understand the Adaptors better... (continued...)
My first big problem--now solved--was that I was unable to view webpages from localhost, that is I couldn’t view webpages served by Apache2 on my own computer. All kinds of other people have had such problems, but none of the help I found on forums seemed to help with my problem. Nevertheless, I did learn a lot more about the way that files are organized in Mac’s UNIX.
Luckily, in my forum perusal, I stumbled across a nice switch to the apachectl function, so that at the command line I could type: "sudo apachectl -t" or "sudo apachectl configtest". This gave me a syntax check of my Apache webserver configuration, which returned a convoluted syntax error. After checking the several files where the errors were called, I figured out that I’d failed to add a space between two separate statements in the configuration file I'd made that set my own permissions. Gah. Anyways, it works now. The problem WAS actually covered in several of the forums I'd searched (because Apple moved some things around in their latest Leopard upgrade), but that's no help when you type the needed file in wrong...
After that I had one additional problem, which was that I used all the default settings with the GBrowse install script, which put everything into the wrong or non-existent directories. When I repeated the install with the correct paths for Mac OS X 10.5.7 and Apache2, I suddenly had the Generic Genome Browser on my computer!
I found this site, which told me to redirect the installer to the following paths when prompted:
Apache conf directory? [/usr/local/apache/conf] /etc/apache2/Presto! A working web server with a working genome browser!
Apache htdocs directory? [/usr/local/apache/htdocs] /var/www/localhost/htdocs/
Apache cgibin directory? [/usr/local/apache/cgi-bin] /var/www/localhost/cgi-bin/
I went to http://localhost/gbrowse/, and got a page annoucing:
Welcome to the Generic Genome Browser!A happier moment of web surfing, I've not had since I found out about the fish with a transparent head.
Okay, nonetheless, I still haven’t gotten the Haemophilus influenzae KW20 genome properly working in the browser. I’ve gone through the tutorial pretty thoroughly and have correctly configured their tutorial Volvox database using a MySQL backend. It works fine.
My KW20 database seems to be correctly imported into MySQL, but nothing shows up on the webpage. I originally thought it had something to do with my configuration file, but now I suspect some kind of import defect. I’ve tried it two ways.
(1)Doing it in memory: Getting the GenBank file from NCBI, converting it with BioPerl’s conversion program, bp_genbank2gff3.pl, then loading it with BioPerl’s bp_bulk_load_gff.pl
(1) Doing it in MySQL with the GFF and FASTA files from TIGR’s homepage and using BioPerl’s MYSQL dumper, bp_seqfeature_load.pl (for which I can find no good link or man page).
Neither of these worked. I’ve been tweaking the configuration file and trying to reload the database in several ways. So far with no luck. But progress! I'm fairly certain I need to understand the Adaptors better... (continued...)
Wednesday, May 27, 2009
As easy as that? Nahh...
For one of our planned experiments, I want to purify periplasmic uptake DNA away from chromosomal DNA in a clean efficient manner. There are likely several good ways to do this, some of which may be relatively complicated. But purity will be very important for our downstream sequencing plans, so complications are okay.
Nevertheless I did a silly little experiment today to see what kind of size bias our in-house GenElute columns from Sigma have...
The columns are based on DNA adsorption to silica under high salt conditions and elution under low salt. However, larger DNAs will have a difficult time eluting off of the column, even under low salt. That’s why the manufacturer states that the columns are only good up for up to 10 kb fragments.
The DNA I’ll feed to cells will be of a discrete size distribution much smaller than chromosomal DNA, so I simply mixed genomic DNA with DNA size standards and ran them over the column. Here’s the results on a 0.6% gel:
Lane 1: 1-kb ladder alone.
Lane 2: INPUT: MAP7 + 1-kb ladder.
Lane 3: OUTPUT: the input run over a silica column with high salt.
Lane 4: lambda ladder alone.
Lane 5: INPUT: MAP7 + lambda ladder.
Lane 6: OUTPUT: the input run over a silica column with high salt.
Lane 7: MAP7 DNA alone.
It looks like the large genomic DNA fragments were pretty efficiently cleaned away from the smaller ladder fragments. The largest Lambda fragment is ~23-kb, and it seems to have been depleted quite a bit as well. Some smaller sheared genomic is clearly coming through, though, as can be seen when comparing lanes 1 to 3 and lanes 4 to 6.
I don’t think this is really enough size bias for our purposes, but I’m really quite surprised at how well it worked, so maybe I’m just being pessimistic.
I wonder how well this will work in real life. Assuming all our ladder fragments were efficiently taken up by cells into the periplasm, the association of the uptake DNA with the membranes is a major concern. If the uptake DNA is only loosely associated with the membranes, then a standard plasmid mini-prep may very well work quite nicely. Since most of the chromosomal DNA will pellet with the other cellular debris and lysed membranes, the column would then take care of most of the contaminating large molecular weight DNA.
Hmmm... I need some real uptake fragments, so I can try this with cells... (continued...)
Nevertheless I did a silly little experiment today to see what kind of size bias our in-house GenElute columns from Sigma have...
The columns are based on DNA adsorption to silica under high salt conditions and elution under low salt. However, larger DNAs will have a difficult time eluting off of the column, even under low salt. That’s why the manufacturer states that the columns are only good up for up to 10 kb fragments.
The DNA I’ll feed to cells will be of a discrete size distribution much smaller than chromosomal DNA, so I simply mixed genomic DNA with DNA size standards and ran them over the column. Here’s the results on a 0.6% gel:
Lane 1: 1-kb ladder alone.
Lane 2: INPUT: MAP7 + 1-kb ladder.
Lane 3: OUTPUT: the input run over a silica column with high salt.
Lane 4: lambda ladder alone.
Lane 5: INPUT: MAP7 + lambda ladder.
Lane 6: OUTPUT: the input run over a silica column with high salt.
Lane 7: MAP7 DNA alone.
It looks like the large genomic DNA fragments were pretty efficiently cleaned away from the smaller ladder fragments. The largest Lambda fragment is ~23-kb, and it seems to have been depleted quite a bit as well. Some smaller sheared genomic is clearly coming through, though, as can be seen when comparing lanes 1 to 3 and lanes 4 to 6.
I don’t think this is really enough size bias for our purposes, but I’m really quite surprised at how well it worked, so maybe I’m just being pessimistic.
I wonder how well this will work in real life. Assuming all our ladder fragments were efficiently taken up by cells into the periplasm, the association of the uptake DNA with the membranes is a major concern. If the uptake DNA is only loosely associated with the membranes, then a standard plasmid mini-prep may very well work quite nicely. Since most of the chromosomal DNA will pellet with the other cellular debris and lysed membranes, the column would then take care of most of the contaminating large molecular weight DNA.
Hmmm... I need some real uptake fragments, so I can try this with cells... (continued...)
Monday, May 25, 2009
Congression versus linkage
Yet again, I did a transformation of Haemophilus influenzae cells with slightly variant H. influenzae genomic DNA. I fed MAP7 DNA-- containing several antibiotic resistance-conferring mutations-- to KW20 competent cells (RR722). I used my third competent cell prep for the third time (experiment 3-3), using frozen stocks. Things looked pretty good, and now I’ve got some real co-transformation numbers to play with to distinguish “congression” from “linkage”...
This time, I selected for four different markers: Resistance to Kan, Nov, Spc, and Nal. As expected, last week’s failure with Spc was due to a mistaken antibiotic concentration. Here’s what the transformation frequencies for each independent marker looked like:
Apparently, all four mutations in MAP7 are point mutations. The good news is that there is indeed variation in transformation rates (~4-fold); the bad news is that the rates I’m getting will require much greater than 10,000X sequence coverage in our planned genome-wide experiment. I may very well need to turn to older methods involving blood in the media to see if I can get rates substantially higher... Or possibly a hyper-rec mutation in the recipient strain.
While Kan and Nov are tightly linked, the other two are unlinked. I should be able to distinguish “congression” from “linkage” by looking at co-transformation rates. I decided to look at co-transformation of the Kan marker relative to the other three, rather than make every possible kind of antibiotic plate for the four markers (which would be 24 kinds of plates).
Here’s what the observed and expected rates of co-transformation looked like (where “expected” was calculated as the product of the independent transformation rates and the ratio of obs/exp is indicated by the number above each pair of bars):
Indeed the co-transformation rate of Kan and Nov dramatically exceeded the expected rate, relative to that of Kan versus Spc or Nal. This is what we expect for linked markers. So the obs/exp ratio I’m seeing for KanSpc and KanNal are presumably due to “congression” rather than “linkage”. That is, not all cells are equally competent in a culture, so I see excess co-transformation as an artifact.
However, if we use the measure Cf to calculate the fraction of competent cells in a culture, we arrive at different answers depending on whether we use the KanSpc or Kan Nal rates.
So, for some definitions (as per Goodgal and Herriot 1961 and others):
If markers behave totally independently:
f(ab) = f(a) * f(b).
Since they don’t, we can calculate the fraction of competent cells as:
Cf = f(a) * f(b) / f(ab).
But then, for two different pairs of unlinked markers:
Cf (KanSpc) = 41%
Cf (KanNal) = 18%
So something more is going on here, since we’re getting different fractions of competent cells, despite it being only a single culture. Thus the assumptions required for the Cf value to work are not entirely valid.
One explanation for this was put forward by Erickson and Copeland 1973, who observed differences in congression for different sets of unlinked markers in B. subtilis. They showed a relationship between co-transformation rates and the position of markers relative to the origin of replication. Thus, co-transformation rates may be influenced by whether or not the transforming markers are recombining into the recipient chromosome before or after the replication fork.
I’ll need more information about the exact identity of the markers I’m using to examine this model with these data. It also would’ve been nice in this case to have SpcNal co-transformation rates.
As an aside, if competence is maximal during DNA replication, this lends support to a conflated DNA repair/food hypothesis for natural competence. Nucleotides are needed during DNA replication, and DNA that’s taken up during this period could serve to faciliate DNA replication, or maybe DNA repair...
-------
A couple other notes:
Now that I’ve done three experiments with the same competent cells, I can look at how reproducible I am. For exp3-3 and exp3-2, I used frozen stocks, while exp3-1 used fresh cells:
Not stellar, but it looks to me that fresh is better than frozen.
Another note: I might be able to increase relative transformation rate for the genome-wide experiment by selecting for a marker right away. This creates some problems, but it allows me to remove “incompetent” cells from the population and focus only on those cells that were actually able to take up DNA and be transformed, possibly significantly improving observed transformation rates:
Of course, tightly linked markers will look artificially high, but I’d get 2X the Spc transformants and 6X the Nal transformants under this regime. I’ll definitely need to understand congression better, as well as the problem of dead cells before seriously considering this for the bulk transformation experiments.
Another (probably better) possibility would be to somehow fractionate transformable cells from others in a competent culture as has been done with renografin gradients in B. subtilis. (continued...)
This time, I selected for four different markers: Resistance to Kan, Nov, Spc, and Nal. As expected, last week’s failure with Spc was due to a mistaken antibiotic concentration. Here’s what the transformation frequencies for each independent marker looked like:
Apparently, all four mutations in MAP7 are point mutations. The good news is that there is indeed variation in transformation rates (~4-fold); the bad news is that the rates I’m getting will require much greater than 10,000X sequence coverage in our planned genome-wide experiment. I may very well need to turn to older methods involving blood in the media to see if I can get rates substantially higher... Or possibly a hyper-rec mutation in the recipient strain.
While Kan and Nov are tightly linked, the other two are unlinked. I should be able to distinguish “congression” from “linkage” by looking at co-transformation rates. I decided to look at co-transformation of the Kan marker relative to the other three, rather than make every possible kind of antibiotic plate for the four markers (which would be 24 kinds of plates).
Here’s what the observed and expected rates of co-transformation looked like (where “expected” was calculated as the product of the independent transformation rates and the ratio of obs/exp is indicated by the number above each pair of bars):
Indeed the co-transformation rate of Kan and Nov dramatically exceeded the expected rate, relative to that of Kan versus Spc or Nal. This is what we expect for linked markers. So the obs/exp ratio I’m seeing for KanSpc and KanNal are presumably due to “congression” rather than “linkage”. That is, not all cells are equally competent in a culture, so I see excess co-transformation as an artifact.
However, if we use the measure Cf to calculate the fraction of competent cells in a culture, we arrive at different answers depending on whether we use the KanSpc or Kan Nal rates.
So, for some definitions (as per Goodgal and Herriot 1961 and others):
If markers behave totally independently:
f(ab) = f(a) * f(b).
Since they don’t, we can calculate the fraction of competent cells as:
Cf = f(a) * f(b) / f(ab).
But then, for two different pairs of unlinked markers:
Cf (KanSpc) = 41%
Cf (KanNal) = 18%
So something more is going on here, since we’re getting different fractions of competent cells, despite it being only a single culture. Thus the assumptions required for the Cf value to work are not entirely valid.
One explanation for this was put forward by Erickson and Copeland 1973, who observed differences in congression for different sets of unlinked markers in B. subtilis. They showed a relationship between co-transformation rates and the position of markers relative to the origin of replication. Thus, co-transformation rates may be influenced by whether or not the transforming markers are recombining into the recipient chromosome before or after the replication fork.
I’ll need more information about the exact identity of the markers I’m using to examine this model with these data. It also would’ve been nice in this case to have SpcNal co-transformation rates.
As an aside, if competence is maximal during DNA replication, this lends support to a conflated DNA repair/food hypothesis for natural competence. Nucleotides are needed during DNA replication, and DNA that’s taken up during this period could serve to faciliate DNA replication, or maybe DNA repair...
-------
A couple other notes:
Now that I’ve done three experiments with the same competent cells, I can look at how reproducible I am. For exp3-3 and exp3-2, I used frozen stocks, while exp3-1 used fresh cells:
Not stellar, but it looks to me that fresh is better than frozen.
Another note: I might be able to increase relative transformation rate for the genome-wide experiment by selecting for a marker right away. This creates some problems, but it allows me to remove “incompetent” cells from the population and focus only on those cells that were actually able to take up DNA and be transformed, possibly significantly improving observed transformation rates:
Of course, tightly linked markers will look artificially high, but I’d get 2X the Spc transformants and 6X the Nal transformants under this regime. I’ll definitely need to understand congression better, as well as the problem of dead cells before seriously considering this for the bulk transformation experiments.
Another (probably better) possibility would be to somehow fractionate transformable cells from others in a competent culture as has been done with renografin gradients in B. subtilis. (continued...)
Thursday, May 21, 2009
Sequencing, Ho!
In response to Rosie’s last post, I wanted to outline my take on two of our planned experiments that involve deep sequencing:
- Transform a sequenced isolate’s genomic DNA (sheared) into our standard Rd strain, collect the transformed chromosomes, and sequence the daylights out of them to measure the transformation potential across the whole genome.
- Feed a sequenced isolate’s genomic DNA to competent Rd, purify the DNA that gets taken up into the periplasmic space, and sequence the daylights out of this periplasm-enriched pool to measure the uptake potential across the whole genome.
The GA2 can generate massive amounts of sequence data for a low cost. The technology is pretty involved, but in short, it involves a version of “sequencing-by-synthesis” from “polonies” (or clusters) amplified from single DNA molecules that have settled down on a 2D surface. (This reminds me tremendously of my first real job after college at Lynx, Inc., which developed MPSS). The original instrument could get ~32 bases from one end of an individual DNA fragment in a polony; the new GA2 instrument can get 30-75 bases from each end of an individual DNA fragment on a polony. The number of clusters in a given run is quite large (though apparently pretty variable), so in aggregate, several gigabases of sequence can be read in a single sequencing “run”, costing about $7000.
For reasonable estimates of coverage, we can use the following conservative parameters:
• 60 million paired-end reads per flow cell (7.5 million / lane)
• 50 bases per read (that’s a very conservative 25 bases per end)
That’s ~3 gigabases! For our ~2 megabase genome, we’d get conservatively get 1500X sequence coverage for a full run, costing ~$7000.
Apparently, if we’re doing things just right, this number would more than double:
• 80 million paired-ends per flow cell (10 million / lane)
• 2 X 50 bases per paired-end read
That’s 8 Gb, or 4000X coverage of a 2 Mb genome! (Or 500X coverage in a single lane)
(1) TRANSFORMATION: Unfortunately, a single run, as described above, is probably not quite what we’d need to get decent estimates of transformation potential for every single-nucleotide polymorphism of our donor sequences into recipient chromosomes. While some markers may transform at a rate of 1/100, we probably want to have the sensitivity to make decent measurements down to 1/1000 transformation rates. For this, I think we’ll still need several GA2 runs. However, we could get very nice measurements with even a few lanes for indel and rearrangement polymorphisms, using spanning coverage (see below).
But, in a single run, we could do some good co-transformation experiments by barcoding several independent transformants, pooling, and sequencing. So if we asked for 100X coverage, we could pool 5 independent transformants into each of 8 lanes (40 individuals total). This wouldn’t give us transformation rates per polymorphic site, but since we’d know which individual each DNA fragment came from, we’d be able to piece together quite a bit of information about co-transformation and mosaicism.
(2) PERIPLASMIC UPTAKE: On the other hand, even one run is extreme overkill for measuring periplasmic uptake efficiency across the genome, because in this instance, we don’t really need sequence coverage, but only “spanning coverage” (aka physical coverage). Since we can do paired-ends, any given molecule in our periplasmic DNA purification only needs to get a sequence tag from each end of the molecule. Then we’ll know the identity of the uptake DNA fragment, based on those ends (since we have a reference genome sequence). Sequence tags that map to multiple locations in the reference will create some difficulties, but the bulk should be uniquely mappable.
So, if we started with 500bp input DNA libraries, spanning coverage in only one single lane (rather than a full run of eight) will give us a staggering 2500X spanning coverage! (500 bp spans X 10 million reads / 2 Mb genome = 2500.) To restate: For <$1000, each nucleotide along the genome of our input would be spanned 2500 times. That is, we’d paired-end tag 2500 different DNA fragments that all contained any particular nucleotide in the genome. If we graphed our input with the x-axis chromosome position and the y-axis spanning coverage, in principle we’d simply get a flat line crossing the y-axis at 2500. In reality, we’ll get a noisy line around 2500. Some sequences may end up over- or under-represented by our library construction method or sequencing bias. In our periplasmic DNA purification, however, we expect to see all kinds of exciting stuff: Regions that are taken up very efficiently would have vastly more than 2500X coverage, while regions that are taken up poorly would have substantially less coverage. This resolution is most certainly higher than we need, but heck, I’ll take it. I would almost be tempted to use a barcoding strategy and compete several genomes against each other for the periplasmic uptake experiments. If we could accept, say, a mere 250X spanning coverage for our uptake experiments, we could pool 10 different genomes together and do it that way. We could even skip the barcodes with some loss of resolution if the genomes were sufficiently divergent, such that most paired-end reads would contain identifying polymorphisms... The periplasmic uptake experiment is our best one so far, assuming we can do a clean purification of periplasmic DNA. The cytosolic translocation experiment has some bugs, but should be similiarly cheap. The transformation experiment will, no matter how much we tweak our methods, require a lot of sequencing. But it's still no mouse. (continued...)
Always check your concentrations (Edited)
Okay, enough about the evolution of sex for now. I got some data again, but sadly discovered that I'd used the wrong selection for my third marker.
Previously, I'd transformed Rd--the standard lab strain of Haemophilus influenzae--with MAP7 genomic DNA, which contains several antibiotic resistance alleles. The frequencies of transformation for the two markers I checked looked reasonable, and the frequency of double transformants exceeded expected (that is, in excess of the product of the independent frequencies).
Since the genes for the two resistances are near each other (linked), this was to be predicted, but I had no third unlinked marker to check how many cells in my competent cell prep were actually competent. If only a small fraction of cells in the culture were competent, the excess double transformants I observed could be spurious, rather than due to linkage.
To more accurately measure the linkage between the loci conferring Kan and Nov resistance, I repeated the transformation with frozen stocks of the same competent cells using a third selection for Spc resistance, which is encoded at a much more distant locus from the other two to act as a control. This involved plating to eight kinds of media: No antibiotic, 3X each individual antibiotic, 3X each double antibiotic, and all three antibiotics.
Sadly, I discovered yesterday that cell counts on the +Spc plates were quite similar to that of the no antibiotic plates, so there was not much selection for transformation at the unlinked locus. Sigh. I initially thought this was because my strain was already resistant to Spc, but then I remembered something Sunita told me: "Double check the antibiotic concentrations in the stock before adding it to the media. You don't always add the same amount." So this post could also be entitled, "Always listen to Sunita." I added 10-fold too little antibiotic to the +Spc media.
So instead of evaluating the fraction of my cells that were competent, I simply ended up repeating my previous experiment with a bunch of extra work. The bad news is that now I have to do it again. The good news is that my no DNA controls were perfectly clean and my technique appears to be somewhat reproducible:
Total viable cell counts were on the order of a billion per mL, and transformation frequencies looked like this, where blue bars illustrate the transformation frequencies for current experiment and red show the previous experiment (calculated as per Rosie's method):
...by no means flawless. The Obs/Exp ratios of double recombinants were 65 and 8 in the 2nd and 1st experiments, respectively. But the frequencies themselves were in the right order of magnitude. It is also possible that there are differences between fresh and frozen comptent cells. I might predict that the fraction of competent cells was reduced in the second experiment.
Just for good measure, I checked out whether spectinomycin had ANY effect. The ratio of + : - spectinomycin should be 1, if the antibiotic had no effect:
CFU / SpcR = 1.2
NovR / NovSpcR = 1.2
KanR / KanSpcR = 1.3
KanNovR / KanNovSpcR = 1.2
So it looks like maybe the Spc did kill some cells; it just wasn't a strong selection.
Try try again...
(continued...)
Previously, I'd transformed Rd--the standard lab strain of Haemophilus influenzae--with MAP7 genomic DNA, which contains several antibiotic resistance alleles. The frequencies of transformation for the two markers I checked looked reasonable, and the frequency of double transformants exceeded expected (that is, in excess of the product of the independent frequencies).
Since the genes for the two resistances are near each other (linked), this was to be predicted, but I had no third unlinked marker to check how many cells in my competent cell prep were actually competent. If only a small fraction of cells in the culture were competent, the excess double transformants I observed could be spurious, rather than due to linkage.
To more accurately measure the linkage between the loci conferring Kan and Nov resistance, I repeated the transformation with frozen stocks of the same competent cells using a third selection for Spc resistance, which is encoded at a much more distant locus from the other two to act as a control. This involved plating to eight kinds of media: No antibiotic, 3X each individual antibiotic, 3X each double antibiotic, and all three antibiotics.
Sadly, I discovered yesterday that cell counts on the +Spc plates were quite similar to that of the no antibiotic plates, so there was not much selection for transformation at the unlinked locus. Sigh. I initially thought this was because my strain was already resistant to Spc, but then I remembered something Sunita told me: "Double check the antibiotic concentrations in the stock before adding it to the media. You don't always add the same amount." So this post could also be entitled, "Always listen to Sunita." I added 10-fold too little antibiotic to the +Spc media.
So instead of evaluating the fraction of my cells that were competent, I simply ended up repeating my previous experiment with a bunch of extra work. The bad news is that now I have to do it again. The good news is that my no DNA controls were perfectly clean and my technique appears to be somewhat reproducible:
Total viable cell counts were on the order of a billion per mL, and transformation frequencies looked like this, where blue bars illustrate the transformation frequencies for current experiment and red show the previous experiment (calculated as per Rosie's method):
...by no means flawless. The Obs/Exp ratios of double recombinants were 65 and 8 in the 2nd and 1st experiments, respectively. But the frequencies themselves were in the right order of magnitude. It is also possible that there are differences between fresh and frozen comptent cells. I might predict that the fraction of competent cells was reduced in the second experiment.
Just for good measure, I checked out whether spectinomycin had ANY effect. The ratio of + : - spectinomycin should be 1, if the antibiotic had no effect:
CFU / SpcR = 1.2
NovR / NovSpcR = 1.2
KanR / KanSpcR = 1.3
KanNovR / KanNovSpcR = 1.2
So it looks like maybe the Spc did kill some cells; it just wasn't a strong selection.
Try try again...
(continued...)
Wednesday, May 20, 2009
Interfering Clones
Since I’m new around here, I thought I’d try and start reflecting on the big picture surrounding the Redfield lab’s research, starting with that last post on Sciara. This will end up taking me several posts, because I’d like to articulate my own understanding of the evolution of sex, which is somewhat muddled and rather uneducated. So this post is meant to be introductory to make sure I have the most basic elements down.
A primary mission of the Redfield Lab is to understand why some bacteria are naturally competent. That is, why do some bacteria have the ability to take up DNA from their environment? Since natural competence requires upwards of a couple dozen gene functions, the pathway must have some direct selective benefit to naturally competent bacteria. Several non-mutually exclusive hypotheses have been posed to account for the maintenance of the natural competence mechanism, namely:
My own understanding of the evolution (or rather, the maintenance) of sex has been hampered mostly by one big problem: In introductory material, it is almost always implicitly in the context of animal sex: or more specifically obligate sexual reproduction between dioecious anisogamous diploids. When I go to a textbook or the internet to look for introductions to the subject of the evolution of sex, the issues are almost always partially confounded by this context. Never mind that for many sexual eukaryotes, sex is facultative or even rare. Never mind that for many sexual eukaryotes, haploidy is the norm. Never mind that there are plenty of critters whose gametes are of equal sizes (isogamous) and make equal contributions to the zygote (this is really extreme in some protists, where the parents are the zygotes!). Never mind that the sexes are not always kept in separate individuals. Etc. I tend to think that this is extremely misleading and I'll likely return to my favorite hypothesis for animal sex (“we’re stuck with it”) at a later date.
When I start thinking about bacteria, what is even meant by “sex” is confusing. In many cases (with the exception of natural competence) sex-like mechanisms are mediated by parasites, like transduction and conjugation systems. To quote the eminent John Roth, “In bacteria, sex is a venereal disease.” But since transformation via natural competence is often touted as an analogy to sex, I’d better get some idea of what kind of help it could provide outside the typical higher eukaryote context.
There is one common pedagogy that helps me considerably in understanding a possible advantage of sex, but this is really about the putative advantage of recombination. Even in the case of recombination, there seems to be a fine balance, as many organisms have modulated how much recombination is allowed through subtle alterations in their lifestyles. So while recombination can offer advantages, it is clearly not an absolute advantage.
Okay, so why can be recombination useful? First and foremost, recombination is used for repairing broken DNA, not for sex. This is why recombination exists. All cellular lifeforms (along with many cellular parasites) need the ability over the long-term to repair DNA double-stranded breaks and some other types of DNA damage by recombination. DNA replication itself becomes a problem in the absence of recombination mechanisms. In this context, rather than increasing diversity, recombination is anti-mutagenic. So first, the advantage of recombination between individuals has to be disentangled from the advantage of recombination within individuals. At least for me, this isn’t as easy as it sounds, but for now, I’ll just distinguish sexual recombination as the kind that shuffles the genetic material between two related individuals.
One common explanation given for the advantage of sex is to escape Muller’s ratchet, which posits that in the absence of sexual recombination, genetic drift can cause weakly deleterious mutations to fix and accumulate in populations merely by chance. This requires the effective population size to be small enough for a population to feel the effects of genetic drift. In infinite populations, all deleterious mutations, no matter how weak, will be eliminated by natural selection. But in finite populations the ratchet causes more and more weakly deleterious mutations to slowly arise in a genetic background, slowly leading to the extinction of the population as its mean fitness inexorably goes to zero... i.e. mutational meltdown.
An affiliated idea (or perhaps simply a more generalized form?), the Hill-Robertson effect, suggests that selection is less efficient for two linked loci under selection than for two unlinked loci under selection. (In a perfectly asexual organism, all loci are perfectly linked.) The following figure (from Wikipedia) illustrates a version of this, called clonal interference.
The x-axis is time and the y-axis is the abundance of a given genotype, where everyone starts out ab. The top graph illustrates a sexual population, while the bottom graph illustrates an asexual population. In the asexual population, beneficial mutations (capital letters) that arise at different loci in different individuals cannot recombine to make the most fit genotype (AB), so one ends up lost (that’s the interference). In order for the two beneficial mutations to end up on the same genetic background, they must independently arise on the same background. With sexual recombination, the two mutations can arise on different individuals and still end up on the same genetic background. I think that this type of clonal interference would still occur in infinite populations, but am not really sure. (This is relevant, since in bacteria, population sizes may often be so large as to reduce the problem of drift.)
I can imagine that clonal interference could be a problem for bacteria, even with large population sizes, and how natural competence could allow a group of bacteria to escape from this interference. So the “Sex Hypothesis” is certainly plausible. The difficulty, then, is how to demonstrate that this is really why natural competence is maintained. Maybe “why?” is simply always a difficult question to approach experimentally...
Okay, so there we go: Presumably, one major advantage of sex is that it allows for recombination between different individuals, which lets fit alleles get together and bad alleles to get culled by selection more rapidly.
For now, I’ll leave it there and just mention that it isn’t necessarily a good thing for the individual to have recombination with some arbitrary related DNA from its environment (or by sex), even assuming that it isn’t somehow damaged. If you have perfectly good co-adapted genes in your genome, replacing one of your alleles with some other one may end up making you less fit... (continued...)
A primary mission of the Redfield Lab is to understand why some bacteria are naturally competent. That is, why do some bacteria have the ability to take up DNA from their environment? Since natural competence requires upwards of a couple dozen gene functions, the pathway must have some direct selective benefit to naturally competent bacteria. Several non-mutually exclusive hypotheses have been posed to account for the maintenance of the natural competence mechanism, namely:
- “The Repair Hypothesis”: As a source for DNA templates for the repair of DNA double-stranded breaks;
- “The Food Hypothesis”: as a source of nucleotides for food; and
- “The Sex Hypothesis”: as a way of shuffling genetic material by transformation with uptake DNA.
My own understanding of the evolution (or rather, the maintenance) of sex has been hampered mostly by one big problem: In introductory material, it is almost always implicitly in the context of animal sex: or more specifically obligate sexual reproduction between dioecious anisogamous diploids. When I go to a textbook or the internet to look for introductions to the subject of the evolution of sex, the issues are almost always partially confounded by this context. Never mind that for many sexual eukaryotes, sex is facultative or even rare. Never mind that for many sexual eukaryotes, haploidy is the norm. Never mind that there are plenty of critters whose gametes are of equal sizes (isogamous) and make equal contributions to the zygote (this is really extreme in some protists, where the parents are the zygotes!). Never mind that the sexes are not always kept in separate individuals. Etc. I tend to think that this is extremely misleading and I'll likely return to my favorite hypothesis for animal sex (“we’re stuck with it”) at a later date.
When I start thinking about bacteria, what is even meant by “sex” is confusing. In many cases (with the exception of natural competence) sex-like mechanisms are mediated by parasites, like transduction and conjugation systems. To quote the eminent John Roth, “In bacteria, sex is a venereal disease.” But since transformation via natural competence is often touted as an analogy to sex, I’d better get some idea of what kind of help it could provide outside the typical higher eukaryote context.
There is one common pedagogy that helps me considerably in understanding a possible advantage of sex, but this is really about the putative advantage of recombination. Even in the case of recombination, there seems to be a fine balance, as many organisms have modulated how much recombination is allowed through subtle alterations in their lifestyles. So while recombination can offer advantages, it is clearly not an absolute advantage.
Okay, so why can be recombination useful? First and foremost, recombination is used for repairing broken DNA, not for sex. This is why recombination exists. All cellular lifeforms (along with many cellular parasites) need the ability over the long-term to repair DNA double-stranded breaks and some other types of DNA damage by recombination. DNA replication itself becomes a problem in the absence of recombination mechanisms. In this context, rather than increasing diversity, recombination is anti-mutagenic. So first, the advantage of recombination between individuals has to be disentangled from the advantage of recombination within individuals. At least for me, this isn’t as easy as it sounds, but for now, I’ll just distinguish sexual recombination as the kind that shuffles the genetic material between two related individuals.
One common explanation given for the advantage of sex is to escape Muller’s ratchet, which posits that in the absence of sexual recombination, genetic drift can cause weakly deleterious mutations to fix and accumulate in populations merely by chance. This requires the effective population size to be small enough for a population to feel the effects of genetic drift. In infinite populations, all deleterious mutations, no matter how weak, will be eliminated by natural selection. But in finite populations the ratchet causes more and more weakly deleterious mutations to slowly arise in a genetic background, slowly leading to the extinction of the population as its mean fitness inexorably goes to zero... i.e. mutational meltdown.
An affiliated idea (or perhaps simply a more generalized form?), the Hill-Robertson effect, suggests that selection is less efficient for two linked loci under selection than for two unlinked loci under selection. (In a perfectly asexual organism, all loci are perfectly linked.) The following figure (from Wikipedia) illustrates a version of this, called clonal interference.
The x-axis is time and the y-axis is the abundance of a given genotype, where everyone starts out ab. The top graph illustrates a sexual population, while the bottom graph illustrates an asexual population. In the asexual population, beneficial mutations (capital letters) that arise at different loci in different individuals cannot recombine to make the most fit genotype (AB), so one ends up lost (that’s the interference). In order for the two beneficial mutations to end up on the same genetic background, they must independently arise on the same background. With sexual recombination, the two mutations can arise on different individuals and still end up on the same genetic background. I think that this type of clonal interference would still occur in infinite populations, but am not really sure. (This is relevant, since in bacteria, population sizes may often be so large as to reduce the problem of drift.)
I can imagine that clonal interference could be a problem for bacteria, even with large population sizes, and how natural competence could allow a group of bacteria to escape from this interference. So the “Sex Hypothesis” is certainly plausible. The difficulty, then, is how to demonstrate that this is really why natural competence is maintained. Maybe “why?” is simply always a difficult question to approach experimentally...
Okay, so there we go: Presumably, one major advantage of sex is that it allows for recombination between different individuals, which lets fit alleles get together and bad alleles to get culled by selection more rapidly.
For now, I’ll leave it there and just mention that it isn’t necessarily a good thing for the individual to have recombination with some arbitrary related DNA from its environment (or by sex), even assuming that it isn’t somehow damaged. If you have perfectly good co-adapted genes in your genome, replacing one of your alleles with some other one may end up making you less fit... (continued...)
Tuesday, May 19, 2009
Sciara and the problems of sex
At the risk of getting too far afield, I want to talk about one of the craziest animals around with respect to transmission genetics. I promise to bring this post back around to bacteria in the future.
The sciarid flies (fungus gnats) illustrate some of the most tantilizing mysteries of genetics in one compact package. When I first learned about the fungus gnats, I found them interesting from the perspective of chromosome mechanics, but now I also think these little flies also illustrate something interesting (or maybe just confusing) about the putative benefits of a sexual lifestyle (one of the major hypotheses for why bacteria are naturally competent).
There are three kinds of chromosomes in fungus gnats, each of which behaves in an unusual way compared to familiar eukaryotic transmission genetics. In this post, I’ll focus on the autosomes, which illustrate how tangled and confusing the issues can become when thinking about the evolution of sex.*
In fungus gnats, males only transmit their mother’s autosomes! That’s right: The males are diploid, obtaining a complement of autosomes from mom and a complement from dad. But somehow, their sperm only include maternal autosomal chromosomes. (Females transmit their autosomes in the ordinary fashion.) From a chromosome dynamics and epigenetics standpoint, this is a very interesting germline development and meiosis** (and offers some entertaining genetics thought-experiments), but what about with respect to the evolution of sex?
Fungus gnats aren’t the only group of animals where males only transmit their maternal autosomes; it’s also true in haplo-diploid hymenopterans, like honey bees. In these animals, males arise from unfertilized eggs (sometimes called arrhenotoky), so that male honey bees are typically haploid. Thus the sperm these males make is also non-recombinant (by necessity, since there’s no homolog there at all) and contain only maternal genetic material.
Why is this of interest? I personally often get confused when reading even simplistic writings on the evolution (or maintenance) of sex. Part of the problem is that numerous issues are often entangled, in this case the issue of ploidy and several overlaying issues of recombination. So describing this unusual lifestyle is not meant to simplify things, but rather to point out just how difficult it can be to think about why an organism might adopt a particular lifestyle.
I’ll talk more extensively about the (simpler) theoretical reasons why sex is thought to be advantageous in another post, but for now the one really major reason that’s always cropping up is that sex offers the chance to recombine two different genotypes to produce new combinations of alleles in the next generation. So sexual recombination, or generating haplotypic diversity, is often considered a major force that maintains a sexual lifestyle.
This is all fine and good, but there appear to be numerous instances of sexual lifestyles that have ameliorated-- but not eliminated-- recombination, raising a variety of questions. A good example of this is the absence of recombination between autosomes in drosophila (fruit fly) males. For a given autosome, a male fruit fly will transmit to his offspring either his mother’s chromosome (which will be recombinant with respect to his maternal grandparents) or his father’s chromosome (which will be non-recombinant with respect to his paternal grandparents). So all else being equal, linkage disequilibrium will break down more slowly in such a population where one sex doesn’t undergo crossing over, as opposed to one where both sexes do.
Genetic transmission is a bit different from this for the fruit fly sex chromosome, the fungus gnat sex chromosomes and autosomes, and all the chromosomes in a honey bee. So for sex chromosomes, since males in all three groups get only a single X from their mother, it will be recombinant with respect to their maternal grandparents but transmitted intact to a male’s progeny. And for the autosomes of both fungus gnats and honey bees, the same thing is true.
But with fungus gnats, males adopt a diploid state, but still behave like a bee with respect to transmitting their autosomes. What’s going on here? One common explanation for the alternation of generations and the haplodiploid lifestyles (i.e. keeping separate haploid and diploid states) is that haploids and diploids feel the effects of natural selection differently. While a recessive allele in a diploid population only manifests when the allele is homozygous, it is always exposed in a haploid. So selection acts more quickly on haploids, and greater allelic diversity (and slower selection) can exist in diploid populations. By having both states in a population, the benefits of both can be enjoyed.
Okay, so that’s all pretty confusing, but I think the take-home is this (all else being equal, except ploidy and genetic transmission mechanism):
(1) Fungus gnat males usually get to cover their recessive alleles by being diploid-- as with fruit flies. But honey bee populations feel selection much more strongly, since their recessive mutations are always uncovered by another allele in males. So deleterious alleles can presumably be weeded out more efficiently in the honey bee than the other two, while recessive alleles have more of a chance to lurk in the fly populations.
(2) But on the other hand (all else being equal) linkage disequilibrium would presumably break down more quickly in fungus gnats and honey bees than in fruit flies, though not as quickly as if homologous pairs recombined in both sexes meiosis.
So there seems to be numerous forces balancing each other here. At a minimum, even ignoring the rest of the factors, the way alleles and combinations of alleles will behave over generations in these three animals will be distinct, depending on the mode of genetic transmission and the ploidy in the different males. The only thing that (all else being equal) is the same between all three animals is the sex chromosome. So the fungus gnat lifestyle offers an interesting hedged strategy as compared to the more familiar fruit fly and bee lifestyles, as well as having some very compelling chromosome mechanics.
* I’ll mention the other two types of chromosomes, just to illustrate how much interesting stuff is going on here: (1) The sex chromosome (X) always non-disjoins in meiosis II in males, such that every sperm is disomic for the X chromosome and thus every zygote is trisomic for the X. Later in development, the elimination of 1 or 2 X chromosomes determines the sex of the animal (female or male, respectively). (2) The limited chromosomes (L) are small heterochromatic knob chromosomes that are only present in the germline, and are somehow eliminated from somatic tissues by selective non-disjunction. So those are two very interesting phenomena in sciarids that will hopefully deserve future mention...
** How do the males accomplish this featof maternal-only autosome transmission? First of all, like many other male dipterans, there is no crossing over between homologous chromosomes. This is also the case in the model organism Drosophila melanogaster. Second, somehow only the maternal autosomes form kinetochores at their centromeres in the meiosis I division, so the paternal autosomes get left behind by the unusual asymmetric spindle of the male meiosis. This means that throughout the germline mitotic divisions, something distinguishes the maternal from paternal chromosomes that then manifests in the first meiotic division... (continued...)
Labels:
apis,
drosophila,
evolution of sex,
ploidy,
recombination,
sciara
Wednesday, May 13, 2009
Using a Computer
I’ve been working on installing the stripped-down open source genome browser, GBrowse, onto my computer. In order to do all this cool sequencing stuff I’m planning, I’ve got a lot of computer-learning curve to overcome. One thing that will be extremely useful both for learning and for our future plans will be to build my own Haemophilus influenzae genome browser.
So GBrowse installation looks pretty easy. But I do need to have various things installed (and configured properly) for it to work:
- Apache (a webserver),
- PHP (a webpage-writing scripting language),
- MySQL (a database language),
- Perl (a scripting language that’s useful for manipulating text),
- Bio-Perl (a set of modular programs written in Perl),
- and finally of course, GBrowse.
Luckily all of these packages are open-source and freely available on the web. Unluckily, my skills on the UNIX command line are slow, and my ability to understand system files are apparently extremely limited.
I initially caused all sorts of havoc to my computer trying to get the basic Apache and PHP set-up working. If functioning properly, I should be able to serve my own webpages and see them in my browser. No such luck. After digging around in some forums, I found that many had similar problems, so started mucking around with sensitive system files as recommended on these forums.
This involved uncommenting out some lines of some seriously ugly looks configuration files used by Apache and making sure that I had permission to view my own webpages... I never made it work the way it was supposed to. I feel comfortable enough with text editors and the basic stuff on the command line, but I really don’t understand the hierarchy of the directories in UNIX. I’m sure that most of my problems are stemming from that.
Fortunately, I had actually been taking notes on everything I was doing to my computer! So I could back out and start over. I also discovered that some clever programmers had made things a lot easier for me by having back-ups of a lot of files around.For example the apache2.conf file that I’d tinkered with could simply be replaced by the apache2.conf.default that was sitting right there in the directory with it, so at the command line: cp apache2.conf.default apache2.conf
Next I discovered an easy way out: XAMPP, which was written specifically for the computer semi-literate to make all the configuring happen smoothly. I installed it, and Presto! Apache, PHP, and MySQL up and running with default localhost webpages appearing in my browser magically. WHEW!
Okay, so moving onto GBrowse... I already have Perl, and GBrowse has a handy install script that will get me BioPerl along with GBrowse itself. This ended up taking forever, because the CPAN mirror (which carries all the Perl stuff) kept kicking me off. Eventually, it all seemed to install, and I got a happy message:
########################################################
GBrowse is now installed. Read INSTALL for further setup instructions.
Go to http://localhost/gbrowse for the online tutorial and reference manual.
########################################################
But to NO AVAIL! The web address didn’t work, and I couldn’t find that INSTALL file for the life of me. (There’s a lot of files entitled INSTALL.) Criminy, I could already tell what happened, and more forum-searching, and more tinkering, it doesn’t matter because I still couldn’t fix it... Paths. My programs are looking for each other in the wrong locations.
XAMPP’s MySQL got broken, and Apache didn’t know where to look for GBrowse. I actually haven’t found it myself yet. So no dice. I need to figure out how to redirect the Paths, such that the XAMPP installation knows where GBrowse is sticking itself and XAMPP needs to know what happened to MySQL after installing BioPerl and GBrowse. Since I can’t tell those things myself, I guess I’m going to have to go back to the drawing board again...
Sigh... So close yet so far. But the note-taking has been utterly crucial. I, like Rosie, have always thought that I was doing “preliminary analysis” when using my computer for something other than Microsoft Office or Firefox, so always have to repeat work. But especially mucking around with my system files, I’m truly glad I kept track.
It may almost be time to wrassle up a real computer person to give me a hand... (continued...)
Monday, May 11, 2009
Radioactive bacteria... What could possibly go wrong?
RADIATION SAFETY TRAINING! This afternoon, tomorrow afternoon, and Wednesday morning, I’ll be taking the radiation safety and methodology course here at UBC. I need to get certified to use radionuclides, since many of our plans will involve using radiolabeled DNAs to optimize our protocols for purifying uptake DNA from different cellular compartments.
I believe this will be the 5th or 6th time I’ve been through such a training course, but I’m sure I can use a refresher. The type of radioactive isotopes used by biologists are typically not very dangerous. I’ll be using 32P, an isotope of phosphorous, which decays quickly and emits beta-particles. Luckily beta-particles are easily blocked by just a little bit of shielding, so it’s easy to avoid getting exposed to the radiation by simply by keeping some plexiglass between yourself and the radiation. The amount of exposure that a lab worker will get from using such a radioactive source is typically less than that faced by someone living at a high altitude (i.e. closer to the sun, like in Denver, Colorado).
Indeed, I'd rather work with radioactive materials than a serious mutagen, like EMS. If I spilled a little radioactive material, I'd easily be able to tell where it dropped using a Geiger counter, facilitating clean-up. But if I dropped a little bit of EMS somewhere, I wouldn't be able to tell where it was, so getting exposed would be much more likely. So despite many people's fears about working with radiation, it's really quite safe, as long as proper protocols are followed. (continued...)
Friday, May 8, 2009
Thursday, May 7, 2009
Degenerate oligos II
(EDITED to add another graph)
In the first post on our planned degenerate oligo experiment, I described how we might expect the input degenerate oligo pools to look. I used the binomial probability to calculate the chance that a given oligo would have k incorrect bases in an oligo of length n.
So Pn(k) = n! / [ k! * (n-k)! ] * p^(n-k) * (1-p)^k, where p was the probability of the correct base being added at a particular position.
As a thought experiment, I now want to assume that only some of the bases in the n-mer matter to the uptake machinery. I’ll call this number m. The first difficulty was to figure out what the probability of hitting h of the m important bases, given k substitutions in a particular oligo.
This took me a while to figure out. I knew it had something to do with "sampling without replacement" and was able to work out the probabilities by hand for k = 0, 1, or 2, but then I started to get flummoxed. Eventually, I stumbled across the right equation to use: The Hypergeometric Distribution. (That’s seriously cool-sounding.)
Since I don’t know how to write a clean looking equation in HTML (see the above morass given for the binomial probability), I’ll again use the notation ( x y ) to mean "x choose y", or x! / [ y! * (x-y)! ] and will carefully add a * where I mean to indicate multiplication. The hypergeometric is then given as:
f(h; n, m, k) = ( m h ) * ( n-m k-h) / ( n k )
h = hits, or the # substitutions in the important bases
n = total length of the oligo
m = # of important bases in the oligo
k = # of substitutions in the oligo.
So if we define only 1/4 of the bases as important to uptake, then for a 32-mer, m = 8. Then, for a 12% degenerate oligo pool, we get a histogram that looks like this:
If we say that every substitution that hits an important base causes a 10-fold decrease in uptake efficiency, but every hit in am unimportant base causes no change in uptake efficiency, we can then tell what our histogram would look like for both our input oligo pool (IN) and our periplasm-enriched oligo pool (OUT):On the left side of the graph, the output is higher than the input, while on the right side of the graph, the opposite is true. This is what we expect, since the more substitutions in the oligo, the more likely they'll hit important bases.
ADDED:
And here's a third graph addressing Rosie's comment, which shows the ratio of OUT/IN for four sets of conditions:
Here it is again on a log-plot:
(continued...)
In the first post on our planned degenerate oligo experiment, I described how we might expect the input degenerate oligo pools to look. I used the binomial probability to calculate the chance that a given oligo would have k incorrect bases in an oligo of length n.
So Pn(k) = n! / [ k! * (n-k)! ] * p^(n-k) * (1-p)^k, where p was the probability of the correct base being added at a particular position.
As a thought experiment, I now want to assume that only some of the bases in the n-mer matter to the uptake machinery. I’ll call this number m. The first difficulty was to figure out what the probability of hitting h of the m important bases, given k substitutions in a particular oligo.
This took me a while to figure out. I knew it had something to do with "sampling without replacement" and was able to work out the probabilities by hand for k = 0, 1, or 2, but then I started to get flummoxed. Eventually, I stumbled across the right equation to use: The Hypergeometric Distribution. (That’s seriously cool-sounding.)
Since I don’t know how to write a clean looking equation in HTML (see the above morass given for the binomial probability), I’ll again use the notation ( x y ) to mean "x choose y", or x! / [ y! * (x-y)! ] and will carefully add a * where I mean to indicate multiplication. The hypergeometric is then given as:
f(h; n, m, k) = ( m h ) * ( n-m k-h) / ( n k )
h = hits, or the # substitutions in the important bases
n = total length of the oligo
m = # of important bases in the oligo
k = # of substitutions in the oligo.
So if we define only 1/4 of the bases as important to uptake, then for a 32-mer, m = 8. Then, for a 12% degenerate oligo pool, we get a histogram that looks like this:
If we say that every substitution that hits an important base causes a 10-fold decrease in uptake efficiency, but every hit in am unimportant base causes no change in uptake efficiency, we can then tell what our histogram would look like for both our input oligo pool (IN) and our periplasm-enriched oligo pool (OUT):On the left side of the graph, the output is higher than the input, while on the right side of the graph, the opposite is true. This is what we expect, since the more substitutions in the oligo, the more likely they'll hit important bases.
ADDED:
And here's a third graph addressing Rosie's comment, which shows the ratio of OUT/IN for four sets of conditions:
Here it is again on a log-plot:
(continued...)
Third transformation's a charm?
My third competent cell preparation and transformation looks like it was fairly successful. Not only were there transformants, but the numbers look a lot more reasonable.
Picking dilutions with reasonable numbers of colonies on the plates:
K = KanR / CFU = 1.54 e -2
N = NovR / CFU = 1.82 e -3
D = KanR NovR / CFU = 1.95 e -4
K*N = 2.8 e -5
D / (K*N) = 7.71
So there’s an excess of nearly 8x double transformants relative to expected. Since the Kan and Nov markers are linked, I can’t accurately measure either linkage or the fraction of the competent cells in the prep. This requires a third marker. I’ll next turn to my frozen stocks and use a third antibiotic resistance marker that’s unlinked from KanR and NovR, perhaps SpcR (spectinomycin resistance). This should allow me to measure the fraction of competent cells in the culture and use this to correct the apparent linkage of Kan and Nov to an accurate measurement.
Notably, the KanR and NovR rates were again about an order of magnitude different. So indeed, there’s substantial variation in transformation rates for different markers. The underlying cause of this variation is something I hope to get at with my planned genome-wide transformation rate measurements in the future.
Other issues:
- The culture’s cell density still looks a bit high (1.4 e 9 / ml), but it’s <1/3>
- My NO DNA CONTROLS also had a few colonies on the Kan plates, indicating either contaminants or a relatively high mutation rate of KanS -> KanR in Rd. I’ll check this by taking the KanR colonies from my negative control plates and streaking to LB to see if they are really Haemophilus. KanR / CFU = 1.8 e -5. This was still substantially lower than cells transformed with DNA, so the rates are still reasonably accurate.
- Some of my plates (particularly the double antibiotic plates) had pretty uneven spreading and heterogeneous colony sizes. I suspect this was due to unevenly spreading fresh hemin onto the plates prior to spread the cell dilutions. I hadn’t yet carefully observed Sunita’s masterful plate spreading technique when I spread the hemin, but did by the time I spread the cells.
Time to make fresh media! (continued...)
Wednesday, May 6, 2009
An apparently unnecessary step
I love old papers. It’s true that they’re often long and obscure, but they also asked questions at a more fundamental level than papers today and with significantly less jargon. My favorite example is this early paper by Seymour Benzer, in which he does things like invent the complementation test and show the likely linear nature of the genetic material.
I’ve been slowly working my way through another old paper, this one on transformation in Haemophilus influenzae Rd:
GOODGAL and HERRIOTT. Studies on transformations of Hemophilus influenzae. I. Competence. J Gen Physiol (1961) vol. 44 pp. 1201-27
This work established many important facts about Haemophilus transformation that we luckily can take for granted today. Later, I’ll undoubtedly touch on other aspects of this paper (namely their measurements of co-transformation and calculations of how many cells in a culture are competent), but right now, I just want to point out a bit of the methods. As I read this section, I was at first incredulous:
The protocol continues on, but when I reached footnote (1), indicated in the heading, I actually laughed aloud on the bus the other day (emphasis mine):
I’ve been slowly working my way through another old paper, this one on transformation in Haemophilus influenzae Rd:
GOODGAL and HERRIOTT. Studies on transformations of Hemophilus influenzae. I. Competence. J Gen Physiol (1961) vol. 44 pp. 1201-27
This work established many important facts about Haemophilus transformation that we luckily can take for granted today. Later, I’ll undoubtedly touch on other aspects of this paper (namely their measurements of co-transformation and calculations of how many cells in a culture are competent), but right now, I just want to point out a bit of the methods. As I read this section, I was at first incredulous:
Preparation of Levinthal Stock for Growth of H. influenzae (1): To a 6 liter flask add 2100 ml distilled H~O and 74 gm Difco dehydrated brain heart infusion. Bring this solution to a vigorous boil, remove from heat, and add cautiously 200 ml defibrinated sheep's blood. The defibrinated sheep's blood is obtained by stirring fresh blood with a rough wooden paddle immediately after it is taken from the animal. Stirring vigorously for 5 minutes with a motor-driven paddle or for 20 minutes by hand should be sufficient. The blood is strained through cheese-cloth and stored frozen before use. Medium made from unfrozen blood is usually turbid. The first addition of blood caused a heavy evolution of gas from the broth producing a violent foaming. Therefore, only 5 ml of blood should be added initially to the hot infusion, and no more added until the gas evolution ceases. Continue to add as rapidly as foaming will permit, until the entire 200 ml is introduced...
The protocol continues on, but when I reached footnote (1), indicated in the heading, I actually laughed aloud on the bus the other day (emphasis mine):
(1) Since this work was completed, Mr. John Cameron and Mr. Harold Isaacson working in this laboratory have found that growth and development of competence of H. influenzae equal to that obtained with Elev broth were observed in a 3.5 per cent solution of Difco brain heart infusion mixed 3:1 with 3 per cent Eugonbroth appropriately supplemented with 10 pg/ml of hemin and 2 /~g/ml of DPN. This medium has been used with uniform success for a number of months. It eliminates a time-consuming and apparently unnecessary step of the addition of fresh blood.Science marches on (and gets easier). I am thankful to Mr. Cameron and Mr. Isaacson for making my life significantly less disgusting. (Unfortunately for me, their methods do report 5-10% transformation frequencies... If these are to be believed, I might need to try using Levinthal Stock for some of my more transformation rate sensitive experiments. Alas.) (continued...)
Tuesday, May 5, 2009
Degenerate Oligos
Rosie and I have been working our way through editing my unfunded NIH postdoc proposal. There's a lot of work to do on it, but I think we'll make it excellent by the time we're done.
This is our first aim (so far):
Determine the selectivity of the outer membrane uptake machinery for USS-like sequences using degenerate USS constructs.
This is a meaty Aim and will require some time to fully flesh out. I'm not even fully convinced it belongs in this grant. It may be better for the NIH R01 submission exclusively.
The idea is to feed Haemophilus synthetic DNA containing a USS with a few random mistakes in each independent DNA molecule. We'd then purify the DNA that was preferentially taken up by the bacteria into the periplasm and sequence it to high coverage using Illumina's GA2 platform. This experiment has a lot of potential, but also a number of issues we need to work out first before we can seriously discuss the possible analyses.
To start with, I want to try and outline the basic properties of the degenerate USS donor construct we’re planning on having made. Later, I’ll try and tackle issues such as sequencing error and competition between different USS-like sequences that we expect with the actual experiment, as well as the analysis and significance of the data we hope to obtain.
As donor DNA, we want to make a long oligo (~200mer), such that the central 32 bp correspond to a known and previously studied USS, except that there will be a chance at each base in the USS for a mistake to be introduced (i.e. one of the three alternative bases could be added, instead of the correct one).
This will entail doping each bottle of dNTPs in the oligo synthesizing machine (A, T, G, or C) with a few percent of each of the other bases. For example, for a 9% degenerate oligo, the bottle that would normally be filled with 100% dATP would instead be 91% dATP, 3% dTTP, 3% dGTP, and 3% dCTP.
What do I expect the input degenerate oligo pools to look like in a sequencer?
I’ll start somewhat simply. Assuming perfect degenerate oligo synthesis and no sequencing error, I can use binomial probabilities to calculate how many oligos per million would be expected to have a certain number of incorrect bases. And I can further calculate the coverage of a given oligo species per million sequence reads. (A million sequence reads is used as a convenient frame of reference for thinking about deep sequencing. Normalizing to parts per million is also something I've seen others do.)
Variables:
n = length of the oligo to be considered (n=32 hereafter)
p = probability that the correct base is added at a particular position
q = 1-p = probability that an incorrect base is added at a particular position
k = the number of incorrect bases in a particular oligo
Then,
( n k ) = total number of possible oligos with k incorrect bases
= “n choose k”
= n! / [ k! * (n-k)! ]
nPk = binomial probability of an oligo having k incorrect bases
= ( n k ) * p^(n-k) * q^k
ppm = expected number of sequence reads per million with k incorrect bases
= nPk * 10^6
seq = number of sequences with k mistakes
= ( n k ) * 3^k
coverage = expected reads per million of a given oligo species with k incorrect bases
= ppm / seq
Example,
n = 32
p = 0.88
q = 0.12
This means that if we produce a degenerate 32mer oligo, in which each position has a 12% chance of incorporating an incorrect base (any one of the alternate 3 bases with equal probability), we would expect 21% of the molecules to have three mistakes.
But there are 4,960 different ways in which a 32mer could have 3 mistakes ( n k ). Furthermore, there are 3^3 = 81 different ways that there could be a 3 base mistake in an oligo, yielding 133,920 distinct oligo species that have 3 mistakes (seq). So out of a million sequence reads, we would only expect to see a given species of oligo with three mistakes 1.57 times per million sequence reads (i.e. 1.57X coverage of each possible 3 mistake oligo).
Here are some graphs to illustrate what this looks like for different levels of degeneracy. These should probably be histograms, but for now, I’m keeping them as they are.
Graph #1: The expected number of sequence reads (per million) with k incorrect bases in the 32mer oligo are shown for different levels of degeneracy.
Graph #2: The expected fold-coverage (per million reads) of individual oligo species with k incorrect bases are shown for different levels of degeneracy.
The take-away message: With moderate levels of degeneracy, we can obtain large amounts of sequence data for oligos with a few incorrect bases, but we will have a substantially more difficult time in getting high coverage of individual oligo species for any more than 2 or 3 incorrect bases per oligo. This means we’ll have to carefully consider how we’ll do the analysis.
Much more on this in the future... (continued...)
Monday, May 4, 2009
Transform? Yes. Well? No.
So I’ve processed my second “successful” transformation from frozen stocks (i.e. counted the colonies on the different plates), and things appear to have gone okay with a huge caveat.
I again appear to have WAY too many viable cells (CFU) and the apparent transformation frequencies of single markers (KanR or NovR) were quite low. Examining the double transformants (cells that picked up both KanR and NovR) implies that only a small percentage of cells were competent (even given linkage between my markers), which makes sense, if I’d drastically overshot the cell density before switching to M-IV.
What to do? I simply need to start over. Technical problems of my dilutions and plating notwithstanding, I’ve still got a pretty poor competent cell culture. Nothing of any utility. Try try again...
Full results follow:
For Experiment MAP7 -> Rd #2.2:
CFU: 6.1 e 10 / mL
KanR: 2.4 e 6 / mL
NovR: 2.8 e 5 / mL
KanRNovR: 8.9 e 3 / mL
Okay, so the CFU numbers are again ridiculously off compared to expectations (expect perhaps a few million per mL). Obviously something is seriously wrong. One possibility is that the way I did the serial dilutions was extremely far off. This doesn’t appear to be entirely the case, since for the mid-range dilutions with a reasonable number of colonies, there were ~10-fold differences between 10-fold dilutions. Here are my colony counts for 100 uL plated (for plus MAP7 DNA; the NO DNA CONTROL worked just fine... no transformants):
CFU:
1e-8: 61 colonies
1e-7: uncountable (not just density, but spreading)
1e-6: uncountable (“ “)
KanR:
1e-5: 6 colonies
1e-4: 26 colonies
1e-3: 236 colonies
NovR:
1e-5: 1 colony
1e-4: 5 colonies
1e-3: 28 colonies
KanR NovR:
1e-3: 1 colony
1e-2: 11 colonies
1e-1: 89 colonies
Certainly not perfect, but also not a seriously dramatic error caused by me not changing tips along my serial dilutions. But it could be much worse for my initial uncounted dilutions. So going from the undiluted M-IV culture to the first couple of serial dilutions may have put far too many cells into the 1e-2 dilutions onwards, but now that they’d been somewhat diluted, maybe the error wasn’t too dramatic. Regardless, I will certainly be changing tips in all future serial dilutions, just for good measure.
Another (not mutually exclusive) possibility is that I had dramatically overshot the cell density going into M-IV. I think this must be part (but not all) of the problem. I’d let the cultures go for 4.5 hrs instead of the typical 2.5 hrs, so despite my taking an OD measurement of ~0.26, perhaps it was really higher. I took the OD straight without diluting. The blank I used was correct (from the original sBHI I used to grow up the cells, made that morning), but perhaps I’m doing something else stupid.
And the evidence suggests that I’ve got quite a lot more double transformants than expected. We do expect that the markers are linked, so some excess is expected, but I don’t think its on the order I’m seeing here.
KanR / CFU: 3.9 e -5
NovR / CFU: 4.6 e -6
KanR NovR / CFU : 1.5 e -7
Expected: 1.8 e -10
Excess (o/e): 822X !
Given that we expect transformation rates on the order of 1 / 100 and the extremely high excess of double transformants, these results imply that only a tiny fraction of the cells in my competent cell preparation were actually competent.
Another way that ignores CFU:
KanR NovR / KanR: 0.38%
KanR NovR / NovR: 3.18%
Thus, the apparent transformation rate of NovR among KanR colonies and the converse are dramatically higher than what I’d expect for the independent markers (even if only trusting the CFU counts with a couple orders of magnitdue). Even with linkage, this seems pretty excessive.
(continued...)
Friday, May 1, 2009
Can I, too, transform Haemophilus?
To kick things off, I’ve performed the lab’s most basic transformation protocol of Haemophilus, using the reference strain (Rd KW20 or RR722) as a recipient of MAP7 DNA, which contains seven antibiotic resistance markers. My plan was to look at the transformation rates of two linked markers and their co-transformation rates. Ideally, I’d also include a third unlinked marker to measure the frequency of competent cells in the preparation, but that will wait until next time. For now I just wanted to get the mechanics down.
Luckily, my years in the Burgess Lab working with budding yeast seems to have prepared me pretty well for this, so I only have to hassle my new lab mate, Sunita, for help minimally.
The protocol is pretty straightforward:
My first attempt failed to yield a subculture that reached the appropriate optical density by the end of the day. After several hours, I realized that I was blanking the spectrophotometer with the wrong media (unsupplemented BHI), and upon reblanking with freshly made sBHI, the cultures seemed appropriately dense, but still something was wrong: I went through the remainder of the protocol and plated cells onto media, but the total amount of viable cells observed the next day was two orders of magnitude too low and there were no observable transformants. Obviously, I need to blank with the same media as was used for the subculture. Oops!
Why the cells refused to grow remains a mystery, especially since the overnights seems to grow just fine. Alas.
My second attempt went far more smoothly. I realized that my overnight cultures had a somewhat lower than expected density (~0.7, instead of >2.0 as reported by Maughan and Redfield 2009 for Rd), so I gave the subcultures some extra time without worrying (4.5 hrs, instead of 2.5 hrs). They reached appropriate densities (interestingly, both the 1:100 and 1:50 dilutions had roughly the same OD by this time), went into M-IV for 100 minutes, got transformed ±DNA, and were either diluted and plated or frozen as 1.25 ml aliquots in 15% glycerol. I also did some spot tests on extra plates just for fun.
The next day, I found that I’d plated at far too high of a density to count viable cells, but... TRANSFORMANTS! Perhaps not a flawlessly executed experiment, but at least the phenomenon I came here to study does indeed happen in my own hands. A happy day.
But I drastically overshot the mark on the plain sBHI plates. Furthermore, I’d had some issues evenly spreading new hemin onto the plates (I think they were too dry), so growth was somewhat uneven. As far as the high density goes, possibly my serial dilution was messed up (not changing tips?... I doubt it, unless the cells are very sticky) or the density of cells was initially higher than I thought. So I returned to the frozen competent cell preparations and re-plated to fresh plates, hopefully in range of being able to accurately count transformation and co-transformation frequencies. I’ll have to wait until this afternoon to get a clearer idea of how my plating went. Stay tuned!
* supplemented Brain-Heart-Infusion broth
** starvation media (at least of nucleotides and their precursors)
(continued...)
Luckily, my years in the Burgess Lab working with budding yeast seems to have prepared me pretty well for this, so I only have to hassle my new lab mate, Sunita, for help minimally.
The protocol is pretty straightforward:
- Grow an overnight culture in sBHI* from frozen stocks;
- Subculture in the morning (1:100 and 1:50 dilutions) for a couple hours in sBHI media until the OD(600nm) reaches 0.20 (indicative of log-phase growth);
- Wash and resuspend in M-IV** media for 100 minutes to make the cells maximally competent;
- Transform with 1 ug of MAP7 DNA (plus a No DNA Control) in 1 ml of competent cells (with remaining aliquots frozen in 15% glycerol) for 15 minutes;
- Perform a dilution series and plate to 4 kinds of sBHI media: No antibiotic, +kanamycin, +novobiocin, and +both.
My first attempt failed to yield a subculture that reached the appropriate optical density by the end of the day. After several hours, I realized that I was blanking the spectrophotometer with the wrong media (unsupplemented BHI), and upon reblanking with freshly made sBHI, the cultures seemed appropriately dense, but still something was wrong: I went through the remainder of the protocol and plated cells onto media, but the total amount of viable cells observed the next day was two orders of magnitude too low and there were no observable transformants. Obviously, I need to blank with the same media as was used for the subculture. Oops!
Why the cells refused to grow remains a mystery, especially since the overnights seems to grow just fine. Alas.
My second attempt went far more smoothly. I realized that my overnight cultures had a somewhat lower than expected density (~0.7, instead of >2.0 as reported by Maughan and Redfield 2009 for Rd), so I gave the subcultures some extra time without worrying (4.5 hrs, instead of 2.5 hrs). They reached appropriate densities (interestingly, both the 1:100 and 1:50 dilutions had roughly the same OD by this time), went into M-IV for 100 minutes, got transformed ±DNA, and were either diluted and plated or frozen as 1.25 ml aliquots in 15% glycerol. I also did some spot tests on extra plates just for fun.
The next day, I found that I’d plated at far too high of a density to count viable cells, but... TRANSFORMANTS! Perhaps not a flawlessly executed experiment, but at least the phenomenon I came here to study does indeed happen in my own hands. A happy day.
But I drastically overshot the mark on the plain sBHI plates. Furthermore, I’d had some issues evenly spreading new hemin onto the plates (I think they were too dry), so growth was somewhat uneven. As far as the high density goes, possibly my serial dilution was messed up (not changing tips?... I doubt it, unless the cells are very sticky) or the density of cells was initially higher than I thought. So I returned to the frozen competent cell preparations and re-plated to fresh plates, hopefully in range of being able to accurately count transformation and co-transformation frequencies. I’ll have to wait until this afternoon to get a clearer idea of how my plating went. Stay tuned!
* supplemented Brain-Heart-Infusion broth
** starvation media (at least of nucleotides and their precursors)
(continued...)
Subscribe to:
Posts (Atom)