<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-3017697080484068141</id><updated>2011-09-08T13:40:08.886-07:00</updated><category term='grants'/><category term='neglected diseases'/><category term='plans'/><category term='rearrangements'/><category term='DNA'/><category term='funny'/><category term='vacation'/><category term='apis'/><category term='human USS'/><category term='funding'/><category term='degenerate'/><category term='alignment'/><category term='supragenome'/><category term='recombination'/><category term='old school'/><category term='computers'/><category term='speculation'/><category term='mutation'/><category term='pattern matching'/><category term='congression'/><category term='linkage'/><category term='evolution of sex'/><category term='periplasm'/><category term='sciara'/><category term='browser'/><category term='ploidy'/><category term='sequencing'/><category term='foolishness'/><category term='public health relevance'/><category term='influenza'/><category term='lab'/><category term='USS'/><category term='cytosol'/><category term='training'/><category term='drosophila'/><title type='text'>No DNA Control</title><subtitle type='html'>Repository for my contemplations of transmission genetics and natural transformation in Haemophilus infuenzae, as well as notes on becoming a hack bioinformaticist</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>90</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-7187521917595915485</id><published>2011-03-16T00:15:00.000-07:00</published><updated>2011-03-16T00:46:08.363-07:00</updated><title type='text'>Phase II</title><content type='html'>&lt;div style="text-align: center;"&gt;&lt;img src="http://3.bp.blogspot.com/-Dn5NBSvCVmQ/TYBl6O5AqlI/AAAAAAAAAnY/Naq5RfqGWE4/s400/060228_more_petri_3.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5584575589043448402" style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 10px; margin-left: auto; text-align: center; cursor: pointer; width: 400px; height: 400px; " /&gt;&lt;/div&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-Dn5NBSvCVmQ/TYBl6O5AqlI/AAAAAAAAAnY/Naq5RfqGWE4/s1600/060228_more_petri_3.jpg"&gt;&lt;/a&gt;&lt;div style="text-align: right;"&gt;&lt;a href="http://pruned.blogspot.com/2006/02/more-gardens-in-petri.html"&gt;image credit&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;The &lt;a href="http://hmpdacc.org/"&gt;Human Microbiome&lt;/a&gt; meeting was awesome...&lt;br /&gt;&lt;br /&gt;&lt;div&gt;Here's a 16S phylogeny of bacteria that live in the human airway:&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-KAdw-iHJMtQ/TYBmwsTgAlI/AAAAAAAAAng/rYyqup_ZEDI/s1600/air.png"&gt;&lt;img src="http://1.bp.blogspot.com/-KAdw-iHJMtQ/TYBmwsTgAlI/AAAAAAAAAng/rYyqup_ZEDI/s400/air.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5584576524652118610" style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 10px; margin-left: auto; text-align: center; cursor: pointer; width: 400px; height: 384px; " /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: right;"&gt;&lt;a href="http://hmpdacc.org/analyses_phylo.php"&gt;here's&lt;/a&gt; some other places on the body&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Anyways, cool stuff.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;So my ears are itching, because yesterday the local genome centre was doing quality control on our libraries of &lt;i&gt;Haemophilus influenzae&lt;/i&gt; &lt;a href="http://nodnacontrol.blogspot.com/2010/12/sequence-submission-ho.html"&gt;recombinants&lt;/a&gt; (and controls)… If all goes well, we'll get scads of data in a couple weeks!&lt;br /&gt;&lt;br /&gt;But I also need to get the work of our "Phase II" plan for Genome BC off the ground quickly, so we'll have samples to submit within a couple months of getting this incoming "Phase I" data.&lt;br /&gt;&lt;br /&gt;The goal is to make maps of transformation frequency across the H.flu genome, when our standard lab strain Rd is transformed by DNA from clinical isolates with varying distributions of genetic variation. We hope to be able to investigate the sequence factors controlling the chance of homologous recombination, not only the effects of divergence, but also the role of chromosomal position, for example proximity to transformation hotspots.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;span class="fullpost"&gt;&lt;br /&gt;There are two parts to making these transformation maps, the experiment itself and the sequence analysis. For the latter, we have worked out a strategy, but I'll just give a brief synopsis here: The challenge is basically to be able to make frequency estimates that are nearly as low as the sequencing error rate. We plan to handle this by using the very high co-transformation rate of adjacent SNPs and the high density of SNPs between H.flu isolates. Thus we will use information within reads and between paired reads to distinguish true transformation from spurious errors.&lt;br /&gt;&lt;br /&gt;The basic experiment is really straightforward: (1) Incubate donor DNA with competent cells, allowing for uptake and recombination. (2) Purify the recipient chromosomes. (3) Sequence these to extremely high genomic coverage and measure the donor-specific allele frequency at each polymorphic site.&lt;br /&gt;&lt;br /&gt;While that sounds simple, there will be a bunch of challenges ahead in getting this done properly (and on-time). We've got a pretty got a pretty good plan for handling the issues, so I think we'll be able to make appropriate samples for sequencing:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;We need as highly competent of cultures as possible. Our sensitivity will depend on sequencing chromosomal DNA fragments from &lt;a href="http://nodnacontrol.blogspot.com/2010/03/chromosome-position-effect.html"&gt;as few non-competent cells&lt;/a&gt; as possible.&lt;/li&gt;&lt;li&gt;We need very high complexity libraries, since we will be sequencing to several thousand fold genomic coverage of each library.&lt;/li&gt;&lt;li&gt;We need replication.&lt;/li&gt;&lt;li&gt;We need more than one clinical isolate to donate its DNA (so that we can actually make this into a real experiment).&lt;/li&gt;&lt;li&gt;We need controls.&lt;/li&gt;&lt;li&gt;We need to show that we don't have to worry about donor DNA contamination (i.e. donor sequences that weren't recombined into the chromosome).&lt;/li&gt;&lt;/ol&gt;So the first thing to do is make the competent-est competent cultures I can. Our sensitivity will depend on sequencing DNA fragments from as few non-competent cells as possible.&lt;br /&gt;&lt;br /&gt;This week, I'm going to test MIV transformation of wild-type Rd and a couple of our hypercompetent mutants (murE and sxy-1) with/without cyclic AMP to see which gives the highest transformation frequencies (and least congression). (I'll start just with our regular old MAP7 DNA). We reason that this will not alter the transformation frequency map in an important way, because the hypercompetent mutants and addition of cAMP affect the regulation of competence, rather than its consequences.&lt;br /&gt;&lt;br /&gt;I am tempted to try transforming H.flu circa 1961 and make the crazy blood brain heart media and do an anoxic transformation. Their transformation frequencies were so high in the sixties! It might be worth doing, though the difference could perhaps be something other than the culture conditions (perhaps our strain has changed?)… It would also require, &lt;a href="http://nodnacontrol.blogspot.com/2009/05/apparently-unnecessary-step.html"&gt;ahem&lt;/a&gt;, some fairly odious preparation...&lt;br /&gt;&lt;br /&gt;Anyways, I'll try and keep the blog updated on this experiment's progress (and later the sequence analysis) as things move along…&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-7187521917595915485?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/7187521917595915485/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2011/03/phase-ii.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/7187521917595915485'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/7187521917595915485'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2011/03/phase-ii.html' title='Phase II'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-Dn5NBSvCVmQ/TYBl6O5AqlI/AAAAAAAAAnY/Naq5RfqGWE4/s72-c/060228_more_petri_3.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-4252924104195632736</id><published>2011-03-01T19:20:00.000-08:00</published><updated>2011-03-01T19:51:22.451-08:00</updated><title type='text'>Uptake specificity as a tool for microbiome research</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-yp9IjE_D5P8/TW27ocAiCcI/AAAAAAAAAnI/KEp25wb6cIk/s1600/autolyticVector.gif"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 320px; height: 209px;" src="http://3.bp.blogspot.com/-yp9IjE_D5P8/TW27ocAiCcI/AAAAAAAAAnI/KEp25wb6cIk/s320/autolyticVector.gif" border="0" alt="" id="BLOGGER_PHOTO_ID_5579321816769497538" /&gt;&lt;/a&gt;&lt;a href="http://www.natx.com/AutolyticCellLines.html"&gt;image credit&lt;/a&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;Sorry for neglecting you, blog.&lt;br /&gt;&lt;br /&gt;So Rosie and I are headed to the &lt;a href="http://www.cvent.com/events/international-human-microbiome-congress/event-summary-c4aa192ce47c44fbb1dc15dd34e5b2c2.aspx"&gt;Human Microbiome Meeting&lt;/a&gt; here in Vancouver in just over a week.  This means there's two posters to make!  AAccK!  We'll be a little out-of-place at the meeting, since our work is mostly in the lab, but we're hoping to learn more about the normal environment of &lt;i&gt;H. influenzae&lt;/i&gt; and maybe even find people with access to mucus interested in bacterial genetic exchange.&lt;br /&gt;&lt;br /&gt;So our posters won't be directly about the microbiome, but instead one will be about our genomics of natural transformation experiment, and the other will be about the uptake specificity measurements using deep sequencing.  As I was thinking about the future of the uptake specificity work, it occurred to me that we could use &lt;i&gt;H. influenzae&lt;/i&gt; as a way to filter &lt;i&gt;Haemophilus&lt;/i&gt; and other Pasterellaceaen chromosomal fragments from a lung mucus DNA sample, for example.  This would at least be a good way to screen out the gobs and gobs of human DNA likely there &lt;a href="http://nodnacontrol.blogspot.com/2009/06/eating-our-dna.html"&gt;(at least most of it)&lt;/a&gt; and maybe focus a microbiome project down to a narrowish group... save a lot of money that way...&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-4252924104195632736?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/4252924104195632736/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2011/03/uptake-specificity-as-tool-for.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/4252924104195632736'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/4252924104195632736'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2011/03/uptake-specificity-as-tool-for.html' title='Uptake specificity as a tool for microbiome research'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-yp9IjE_D5P8/TW27ocAiCcI/AAAAAAAAAnI/KEp25wb6cIk/s72-c/autolyticVector.gif' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-6262316515459827361</id><published>2010-12-11T12:49:00.000-08:00</published><updated>2010-12-11T12:53:08.726-08:00</updated><title type='text'>Sequence submission, ho!</title><content type='html'>&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 309px; height: 320px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/TQPkGjL-WpI/AAAAAAAAAmw/gRFj4VwqSKg/s320/i_analyser_02.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5549529967026395794" /&gt; &lt;a href="http://seq.molbiol.ru/i_analyser.html"&gt;link&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Hi blog!  Long time, no update…&lt;br /&gt;&lt;br /&gt;On Thursday, I submitted 91 genomic DNA samples to our local genome centre (center for you American reader).  They’ll be sheared up, converted into indexed (barcoded) libraries, and sequenced on an Illumina GA2 over 4 “lanes”.  Whew!  That should have been easier.&lt;br /&gt;&lt;br /&gt;I won’t get the data until probably around February, so with that out-of-the-way for now, I’d best get my act together.  But before that, here’s what this first round of sequencing for our Genome BC grant is about:&lt;br /&gt;&lt;br /&gt;We want to get a basic genetic picture of what happens when we do our standard transformation experiments; we lack simple rules of genetic transmission by natural transformation.  How many DNA fragments do competent cells add to their chromosomes by homologous recombination?  How long are the segments?  How does genetic divergence affect transformation?&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 320px; height: 256px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/TQPkKoXos0I/AAAAAAAAAm4/7k8vDieOylk/s320/ch7f16-1.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5549530037136962370" /&gt; &lt;a href="http://www.ncbi.nlm.nih.gov/books/NBK21993/"&gt;link&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Haemophilus influenzae KW20 cells can be made naturally competent by starvation of log-phase cells in M-IV media.  A fraction of the cells in the culture becomes competent, able to take up added DNA fragments into their cytosol (in H. influenzae, uptake is biased to fragments containing USS, abundant motifs in H. influenzae chromosomes).  Taken up DNA fragments can recombine with the cells’ chromosomes, if sufficient sequence identity between the fragment and part of the chromsome allows it.&lt;br /&gt;&lt;br /&gt;We added genomic DNA from an antibiotic-resistant derivative of a divergent clinical isolate NP (86-028NP NovR NalR) to competent KW20 cultures in three separate experiments.  In each case, we isolated antibiotic-resistant and –sensitive clones, grew these up in overnight cultures, saved frozen stocks, and extracted genomic DNA from the rest of the cultures for sequencing.&lt;br /&gt;&lt;br /&gt;In these experiment, we are not getting a particularly “natural” view of natural transformation, since the donor DNA we’re using is purified chromosomal DNA from a single divergent isolate.  Since we know effectively nothing about the actual DNA in the natural environment of naturally competent H. influenzae—its source, size-distribution, or associated muck—this seems like a good place to start.  We know the genome sequence of the donor and can use the genetic differences between donor and recipient to identify transforming DNA segments.&lt;br /&gt;&lt;br /&gt;Based on our preliminary work, we think that these transformed clones will contain long(ish) segments from NP, replacing segments of the KW20 chromosome.  We’re using multiplexed Illumina GA2 paired-end sequencing to identify recombinant segments in a large set of clones from these three experiments.  We won’t get perfect genome sequences out the other end; instead we expect a median read depth of ~60 per clone, enough to probably capture the vast majority of diagnostic polymorphisms in each clone.  Then we’ll look at the distribution of recombinant segments across the independent transformants.  That’ll be the Science part.&lt;br /&gt;&lt;br /&gt;Here’s what the 91 samples were:&lt;br /&gt;&lt;br /&gt;Selected set: 72&lt;br /&gt;Because we estimate that only about 10% of the cells in each culture was actually competent, we selected for either novobiocin or nalidixic acid resistant clones (3x12 of each) to ensure that the clone was derived from a competent cell.  To allow for selection for transformed cells, the NP donor DNA was purified from a clone made NovR and NalR by PCR-mediated transformation.  This means that for a given selected clone, a recombinant segment is always predicted that spans the appropriate locus.  At the two selected loci, we will be able to ask about “LD decay” away from the selected positions.  We also expect a large number Independent recombinant segments in these clones; these will provide basic information about the distribution, size, and breakpoints of recombination tracts in natural transformants.&lt;br /&gt;&lt;br /&gt;Unselected set: 8 pools of 2&lt;br /&gt;While we estimate percent-competence by evaluating the “congression” (unexpectedly high co-transformation) of “unlinked” markers (not on the same DNA fragment), this calculation relies on an assumption about competent cultures, namely that cells come in only two flavors: either non-competent, unable to take up DNA, or competent, able to take up several long fragments of DNA.  It could also be that “congression” arises from a more quantitative distribution of states, where cells are more or less competent; i.e. able to take up and recombine a variable number of fragments.&lt;br /&gt;For the most part, the use of “congression” to evaluate the overall competence of a culture is probably valid, but in terms of generalizing from the above large set of selected transformants, we’d like to know how well this assumption holds true.  Because we expect that most unselected clones will look just like the recipient KW20 chromosome, we pooled 16 unselected clones into pairs and include these pools as 8 of our submitted samples.  Since we predict that only 10% of cells in the cultures was competent, our null hypothesis is that 1-3 clones will look like selected clones (i.e. a handful of independent recombination tracts) and the rest will look just like KW20.  We may, however, find that many more clones show evidence of transformation, but containing only one or only very short recombination tracts.&lt;br /&gt;&lt;br /&gt;Selected late-log set: 8&lt;br /&gt;Similarly, we also included 8 NovR clones selected from a late-log transformation.  Cultures of cells in late-log are considerably less competent than MIV cultures (~1-2 orders of magnitude), and Rosie showed me some old data in her notebook suggesting that this could roughly be accounted for by percent-competence.  The suggestion is that the low transformation frequency of late-log cultures vs MIV cultures is only because fewer cells are competent, rather than late-log competent cells being less transformable.  Sequencing a handful of transformants from a late-log culture will help sort this out, in a similar fashion as sequencing unselected clones from MIV cultures.&lt;br /&gt;&lt;br /&gt;Controls: 3&lt;br /&gt;We included both the donor and recipient genomes as samples; while we’ve already obtained ridiculous coverage of these samples, including them here acts as an internal “coverage control”.  We also threw in some MAP7 DNA, which is a KW20-derivative with a bunch of antibiotic-resistances.  While it’s going to look almost like KW20, we use it on a practically daily basis, so it’d be nice to see its sequence.&lt;br /&gt;&lt;br /&gt;So that’s the set of 91 clones we’re having sequenced with our first round of Genome BC cash.  I’m keeping my fingers crossed that the data is good and abundant and reveals new things…&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-6262316515459827361?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/6262316515459827361/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2010/12/sequence-submission-ho.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/6262316515459827361'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/6262316515459827361'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2010/12/sequence-submission-ho.html' title='Sequence submission, ho!'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_7qRGxl6StM4/TQPkGjL-WpI/AAAAAAAAAmw/gRFj4VwqSKg/s72-c/i_analyser_02.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-6828401741820548976</id><published>2010-08-21T00:37:00.000-07:00</published><updated>2010-08-21T11:39:59.579-07:00</updated><title type='text'>It's more delicious because it's more nutritious</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.jbc.org/content/280/45.cover-expansion"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 241px; height: 320px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/TG-Eax7pRyI/AAAAAAAAAmI/cyI3RIDPkpQ/s320/F1.medium.gif" alt="" id="BLOGGER_PHOTO_ID_5507766464912377634" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;I think I got somewhere with my degenerate sequencing data... possibly even discovered something unexpected.  The first thing I had to do was more or less start over my analysis, and temporarily forget everything I’ve read about “information”,  “bits”, “logos”, etc. and just go back to thinking about the experiment from the get-go...&lt;br /&gt;&lt;br /&gt;When it comes to DNA, &lt;span style="font-style: italic;"&gt;H. influenzae&lt;/span&gt; are picky eaters.  They seem to find some DNA fragments tastier than others.  When I came to the lab, Rosie provided me  with a DNA sequence: AAAGTGCGGTTAATTTTTACAGTATTTTTGG.  She told me that DNA fragments containing this sequence (the “consensus USS”) are especially delicious to cells, and she wanted to know why.&lt;br /&gt;&lt;br /&gt;Well, before I can aspire to helping with that, it'd be helpful to know which parts (or combinations of parts) of this special sequence actually make it delicious.  Are all positions equally important?  Do different positions make independent contributions to deliciousness, or do they work together to make an especially tasty morsel?  Perhaps such insight into H.flu’s palate will get me closer to the bigger project of understanding why H.flu is so picky.&lt;br /&gt;&lt;br /&gt;I did the following experiment last month to this end:&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;(1) I provided a massive smorgasbord of DNA to a whole bunch of cells... hundreds of culinary choices per cell to a billion cells.  This tremendous buffet also had ridiculous variety.  Every DNA snack was based on Rosie’s extra-tasty sequence, but there were differences in every bite (due to incorporating 24% degeneracy at each base during DNA synthesis).   Indeed, there were hundreds of billions of different taste sensations in the input DNA pool.&lt;br /&gt;&lt;br /&gt;(2) I harvested the undigested meals from the cells (facilitated by a mutation that kept the food from getting digested).  The cells probably had terrible heartburn at that point, but each on average managed to stuff down dozens of DNA molecules.&lt;br /&gt;&lt;br /&gt;(3) I sent two DNA samples to my friend to get sequenced: a bit of the original DNA buffet, and the DNA that I purified from the cells’ guts.  By comparing the relative abundance of different DNA molecules, we can learn which positions are important to making the sequence tasty and, hopefully, how the different positions work together to impart deliciousness.&lt;br /&gt;&lt;br /&gt;(4) Now I have two giant lists of &gt;10 million sequences each from my friend.  So far, I’m working with the first 100,000 of each  dataset so I can figure out how to process the data.  For the below analysis, this is more than sufficient to make accurate frequency measures.&lt;br /&gt;&lt;br /&gt;First, I asked how the uptake sample (the H.flu guts) compares to the input sample, treating each position in the sequence independently.  To keep things extra simple,  I only considered whether a given position on a particular sequence read is matched or mismatched from the original sequence.&lt;br /&gt;&lt;br /&gt;So, for each position, I counted how many of the 100,000 reads in each set were mismatched.  To calculate the relative enrichment of the mismatched sequence in the uptake sample compared to the input sample, I simply took the ratio of uptake:input, then took the log of that ratio.  This normalization makes it so that positive numbers indicate enrichment between uptake and input, while negative numbers indicate depletion.  Here’s what I get:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/TG-CqXvRfwI/AAAAAAAAAlo/fbQu6nTZmN0/s1600/greydepleteONES.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 200px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/TG-CqXvRfwI/AAAAAAAAAlo/fbQu6nTZmN0/s400/greydepleteONES.png" alt="" id="BLOGGER_PHOTO_ID_5507764533735816962" border="0" /&gt;&lt;/a&gt;Lovely.  It looks like some parts of the original sequence are more important than other parts.  For example, sequences that didn’t have a C at position 7 were depleted more than 64-fold in the uptake sample compared with the input sample.  This must be an especially important flavor.  On the other hand, some of the other positions look like they contribute less to tastiness, and a few positions don’t seem to contribute at all.&lt;br /&gt;&lt;br /&gt;Great!  We could now go and compare this to some other information we have, but for now, I’m going to ignore all of that and move on to a pairwise mismatch analysis.  This will tell us how pairwise combinations of mismatches affect tastiness.&lt;br /&gt;&lt;br /&gt;First, here’s that first barplot again, with the bars re-colored by the amount of depletion.  Mismatches at red positions make the DNA taste bad, while mismatches at blue positions don’t.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/TG-CucbJlAI/AAAAAAAAAlw/Vn2VZHICQyI/s1600/RdGrBudepleteONES.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 200px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/TG-CucbJlAI/AAAAAAAAAlw/Vn2VZHICQyI/s400/RdGrBudepleteONES.png" alt="" id="BLOGGER_PHOTO_ID_5507764603713065986" border="0" /&gt;&lt;/a&gt;That’s the legend for this next image, which shows the log2(uptake/input) for each pair of positions.  To make the dataset, I counted how many of the 100,000 reads in each set had mismatches at a particular pair of positions.  As with the above, I did not condition on the total number of mismatches in the read, just that at least those two were.  On the diagonal is shown the consensus sequence and values corresponding to the above barplot.  The plot is symmetric about this diagonal.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/TG-CzSB3OhI/AAAAAAAAAl4/_NblzWb9cDE/s1600/RdGnBuPAIRWISE.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 400px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/TG-CzSB3OhI/AAAAAAAAAl4/_NblzWb9cDE/s400/RdGnBuPAIRWISE.png" alt="" id="BLOGGER_PHOTO_ID_5507764686821997074" border="0" /&gt;&lt;/a&gt;The big cross of red is like what we saw in the 1D barplot above...  Those same important positions are still important.  The cross tells us that, for example, not having a C at position 7 means you taste terrible, independent of where else you have a mismatch.&lt;br /&gt;&lt;br /&gt;BUT, Hot damn!  I think there’s a discovery here beyond that delectable flavor packet of GCG at positions 6-8 (and possibly a few flanking bases).&lt;br /&gt;&lt;br /&gt;Note the 3x3 grid of regularly-spaced "mismatch interactions.  Note the spacing between these 3 interacting parts of the sequence are just about a helical turn of DNA apart (10-12 base pairs).&lt;br /&gt;&lt;br /&gt;I think this picture has some implications for our molecular models...&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;brain whirring...&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;(Notes:  After getting the number-crunching steps nice and tidy, I spent an inordinate amount of&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/TG-DyQY4hpI/AAAAAAAAAmA/FvS7Obb0f8E/s1600/colorlegend.png"&gt;&lt;img style="float: right; margin: 0pt 0pt 10px 10px; cursor: pointer; width: 133px; height: 200px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/TG-DyQY4hpI/AAAAAAAAAmA/FvS7Obb0f8E/s200/colorlegend.png" alt="" id="BLOGGER_PHOTO_ID_5507765768713438866" border="0" /&gt;&lt;/a&gt; time making these plots.  I started using heatmap(), graduated to heatmap.2(), but ultimately took it to the next level and went fully customized with the image() function.  It also took a ridiculously long time to figure out how to have fine-control over the color-scaling. This was necessary to be able to make the legend.  I also made a more normal-looking legend (below), which was quite tricky to do right...   If I wasn't so distracted by helical turns at the moment, I might explain how I made these in more detail.)&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-6828401741820548976?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/6828401741820548976/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2010/08/its-more-delicious-because-its-more.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/6828401741820548976'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/6828401741820548976'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2010/08/its-more-delicious-because-its-more.html' title='It&apos;s more delicious because it&apos;s more nutritious'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_7qRGxl6StM4/TG-Eax7pRyI/AAAAAAAAAmI/cyI3RIDPkpQ/s72-c/F1.medium.gif' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-959210398066373686</id><published>2010-08-17T00:44:00.001-07:00</published><updated>2010-08-17T02:25:54.819-07:00</updated><title type='text'>Direct uptake specificity measurements</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.devx.com/DevX/Article/43469"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 278px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/TGpUa8MDCJI/AAAAAAAAAlg/PA6wbJdZOxw/s320/43469figure1.jpg" alt="" id="BLOGGER_PHOTO_ID_5506306316223842450" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;So I got a bit distracted from our plans with the Genome BC dough, since I got some more sequencing data!  Woo!  I'll return to the design of our 92 clones worth of sequencing later.  For now, I am trying to work through some more grant writing, incorporating our findings from the new sequence data.&lt;br /&gt;&lt;br /&gt;Below, I'll (try to) quickly summarize what I've found with my (very) preliminary data analysis and then try to describe the thing I'm having a hard time writing clearly about...&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;First off:  It worked!  My scheming panned out quite nicely.  I got ~15 million single-end sequence reads from each of two DNA pools.  One was from the input 24% degenerate USS-containing fragment.  The other was from the periplasm-purified DNA fragments.  The experiment and its outcome were described &lt;a href="http://nodnacontrol.blogspot.com/2010/08/anticipation-is-palpable.html"&gt;here&lt;/a&gt; and &lt;a href="http://nodnacontrol.blogspot.com/2009/09/reverse-engineering.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;Filtering the data:&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The first thing I had to do was somehow parse the raw data file.  For my first pass, I used a quick and dirty, but probably quite stringent, quality filter:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;I excluded any read where the four first and six last bases were not exactly what they should be.  (These sequenced positions were supposed to be non-degenerate, so should have zero mismatches from the expected input sequence.)  This excludes reads with known "bad" bases and forces the alignment to be correct.  The unfiltered data contained a bunch of reads that clearly had insertions or deletions within the USS, probably due to the limitations of the original oligo synthesis.&lt;/li&gt;&lt;li&gt;I excluded all reads that contained any Ns.  This makes the analysis a bit simpler, especially when calculating the number of mismatches in the degenerate USS from consensus.&lt;/li&gt;&lt;/ol&gt;This left me with ~10 million reads in each dataset.  That's a lot of loss, and I certainly will try and return to those sequence reads, but for now, leaving them out is a decent filter that will simplify the analysis (mainly by forcing a "true" alignment).&lt;br /&gt;&lt;br /&gt;Details:&lt;br /&gt;&lt;br /&gt;I piped a few greps together to pull all the sequence reads matching my criteria (the first four bases ATGC, the last six bases GGTCGA, and no Ns).  From a UNIX terminal:&lt;br /&gt;&lt;blockquote style="font-family: courier new;"&gt;grep ^ATGC sequence.txt | grep GGTCGA$ | grep -v N &gt; newfile.txt&lt;/blockquote&gt;That's it!  (Someday, I should probably do something with all the quality scores, but for now, I'm ignoring them.  This operation also lost all the read IDs.)&lt;br /&gt;&lt;br /&gt;I soon found that when analyzing these new files that my poor computer-foo is making very slow-running scripts, so I took the first 100,000 reads from each filtered dataset onwards with a simple:&lt;br /&gt;&lt;blockquote style="font-family: courier new;"&gt;head -n 100000 newfile.txt&lt;/blockquote&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;Making a Position Weight matrix:&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I next calculated the frequency of each base at each position for the two datasets (generating two position weight matrices, or PWMs). Luckily, I have previously worked out how to use the input PWM to background-correct the periplasm-purified PWM &lt;a href="http://nodnacontrol.blogspot.com/2009/10/corrected-logos.html"&gt;here&lt;/a&gt; and &lt;a href="http://nodnacontrol.blogspot.com/2009/09/struggling-with-background.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Details:  On to some R...&lt;br /&gt;&lt;br /&gt;I read in the data with &lt;a href="http://rss.acs.unt.edu/Rdoc/library/base/html/scan.html"&gt;scan&lt;/a&gt;.  I've usually used thing like read.table or read.csv, but scan looks like the way to go with a big giant vector:&lt;br /&gt;&lt;blockquote style="font-family: courier new;"&gt;sequences &lt;- scan("newfile.txt",      what="character"      )&lt;/blockquote&gt;Next, I calculated the per position base frequencies:&lt;br /&gt;&lt;blockquote&gt;&lt;span style="font-family:courier new;"&gt;basesread &lt;- 42&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;newmatrix &lt;- matrix(, basesread, 4)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;for (i in 1:basesread) {&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;     newmatrix[i, ] &lt;-&lt;br /&gt;     table(substr(sequences, i, i))&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;         }&lt;/span&gt;&lt;br /&gt;&lt;/blockquote&gt;Hmmm... Would've been nice to avoid that loop...&lt;br /&gt;&lt;br /&gt;From the two matrices, I got a background-corrected periplasmic uptake specificity motif that looks like this:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/TGpQsIK5dRI/AAAAAAAAAlA/c1WCMOjMZak/s1600/directuptakemotif.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 178px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/TGpQsIK5dRI/AAAAAAAAAlA/c1WCMOjMZak/s400/directuptakemotif.png" alt="" id="BLOGGER_PHOTO_ID_5506302213451511058" border="0" /&gt;&lt;/a&gt;This excludes the fixed bases and one other anomalous base at the beginning of the read.&lt;br /&gt;(Sorry I can't use Weblogo or other things that make nicer logos, as none accomodate position-specific base composition.)&lt;br /&gt;&lt;br /&gt;Hey, look at that C at position 7!  It just sticks way out!  That must be a particularly important residue for uptake.  A previous postdoc and graduate student both had measured uptake efficiency for fragments with point mutations in a consensus USS.  This data offers a great independent validation of this basic result.  Here's what that data looks like:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/TGpSjG2xdvI/AAAAAAAAAlI/ZvGS0xqjOU8/s1600/mutantuptake.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 80px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/TGpSjG2xdvI/AAAAAAAAAlI/ZvGS0xqjOU8/s320/mutantuptake.png" alt="" id="BLOGGER_PHOTO_ID_5506304257503098610" border="0" /&gt;&lt;/a&gt;It looks like the two approaches are giving comparable results.  Fabulous!&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;The next step:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So what else can we learn from this much larger deep sequencing dataset?  I'm sure all sorts of things.  For now, the only other thing I've done is calculate the number of mismatches between each of the 100,000 filtered sequence reads and the consensus USS the degenerate construct was based on.&lt;br /&gt;&lt;br /&gt;Here's what the distribution of mismatch number looks like for the two datasets:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/TGpS5p9PCaI/AAAAAAAAAlQ/VS4vQTJkK1k/s1600/mismatchdist.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 197px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/TGpS5p9PCaI/AAAAAAAAAlQ/VS4vQTJkK1k/s320/mismatchdist.png" alt="" id="BLOGGER_PHOTO_ID_5506304644882565538" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Awesome!  Shifted, just as previously simulated!&lt;br /&gt;&lt;br /&gt;Details:&lt;br /&gt;&lt;br /&gt;At first I tried a bunch of extremely laborious and slow ways of doing this.  Then I found a package called "&lt;a href="http://cran.r-project.org/web/packages/cba/index.html"&gt;cba&lt;/a&gt;" with a function called "sdists", which can do edit distances with vectors of character strings.  (The sdists step is how I murdered the computers  when I tried using the full dataset.)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;/span&gt;&lt;blockquote&gt;&lt;span style="font-family:courier new;"&gt;install.packages("cba")&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;library(cba)&lt;/span&gt;  &lt;span style="font-family:courier new;"&gt;consensus &lt;-    "ATGCCAAAGTGCGGTTAATTTTTACAGTATTTTTGGGTTCGA"&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;distances &lt;-    as.vector(sdists(consensus, &lt;/span&gt; &lt;span style="font-family:courier new;"&gt;                    sequences, &lt;/span&gt; &lt;span style="font-family:courier new;"&gt;                    weight=c(1,0,1)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;             )&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;                       )&lt;/span&gt; &lt;/blockquote&gt;&lt;br /&gt;I guess I could just do it in chunks for the whole dataset to save on RAM...&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Anyways, this vector of distances is going to serve as an important index to the sequence reads for the next set of analyses (which will look at co-variation between positions and how specific mismatches affect the motif).&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;Surprise:&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Okay, now to try and describe the thing I'm having trouble saying in a succinct way.  These information content measurements, as in the above motif, are in bits.  This is "information content", but can also be thought of as "surprise".  This is why the USS motif derived from the genome has so much higher information content than the new experimentally determined uptake motif.  (Here's the genomic USS:)&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/TGpTOT2vDEI/AAAAAAAAAlY/iRb8G-2OD3g/s1600/genomicUSS.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 80px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/TGpTOT2vDEI/AAAAAAAAAlY/iRb8G-2OD3g/s320/genomicUSS.png" alt="" id="BLOGGER_PHOTO_ID_5506304999726976066" border="0" /&gt;&lt;/a&gt;Because most of the bases at a given position in the degenerate USS pool were already the preferred consensus base, little "surprise" was elicited when the preferred base was enriched in the periplasm-purified pool.  As measure in bits.  That's why the scale is so different between the two motifs.&lt;br /&gt;&lt;br /&gt;Anyways, the issue arises when thinking about some other possible experiments.  For example, if uptake specificity were partially modified (as we think we can accomplish using engineered strains), the positions with a new specificity would become extremely informative relative to the other when using the normal degenerate fragments.  But if a new construct were designed based on the altered specificity, these positions would no longer have such elevated information content relative to the other bases...&lt;br /&gt;&lt;br /&gt;This isn't actually a bad thing--in fact we can exploit it--it's just something hard to describe clearly...&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-959210398066373686?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/959210398066373686/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2010/08/direct-uptake-specificity-measurements.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/959210398066373686'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/959210398066373686'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2010/08/direct-uptake-specificity-measurements.html' title='Direct uptake specificity measurements'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_7qRGxl6StM4/TGpUa8MDCJI/AAAAAAAAAlg/PA6wbJdZOxw/s72-c/43469figure1.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-2608638541389501375</id><published>2010-08-11T15:14:00.000-07:00</published><updated>2010-08-11T15:27:19.876-07:00</updated><title type='text'>How to spend some dough?</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://blog.craftzine.com/archive/2009/02/fiveminute_pizza_dough.html"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 217px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/TGMgwQKuZ0I/AAAAAAAAAkg/S_1VpqD2fkk/s320/Five_min_pizza_dough.jpg" alt="" id="BLOGGER_PHOTO_ID_5504279182922639170" border="0" /&gt;&lt;/a&gt;We have some money to spend (kindly provided by Genome BC) for identifying recombination tracts in individual transformants, where donor genomic DNA from the clinical isolate NP is used to transform Rd competent cells (NP-into-Rd transformations).  This will help us answer some very basic questions about natural transformation:  What are the numbers and sizes of recombination tracts in individual transformants?  Are different parts of the genome equally transformable?  Do mismatch repair and other degradative mechanisms limit transformation?&lt;br /&gt;&lt;br /&gt;We already have a few clones worth of data (&lt;a href="http://nodnacontrol.blogspot.com/2010/02/grantgrantgrant.html"&gt;which is now mostly analyzed&lt;/a&gt;), but we need more clone sequences to do any worthwhile statistics.   Actually, I've made nicer pictures and done some re-analysis for the poster I took to the &lt;a href="http://www.evolutionsociety.org/SSE2010/"&gt;Evolution&lt;/a&gt; meeting.  I should post some of that sometime...&lt;br /&gt;&lt;br /&gt;We’d &lt;a href="http://nodnacontrol.blogspot.com/2010/03/multiplexing-sans-barcodes.html"&gt;originally had a cost-saving plan &lt;/a&gt;of sequencing overlapping pools of clones to obtain data from 64 clones.  We even came up with a slicker plan (since sequence yields keep increasing), in which we'd get 128 clones worth of data, and 64 of these would be "de-convoluted": &lt;span class="fullpost"&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/TGMiUFdHwlI/AAAAAAAAAkw/_wDcj4vI-WA/s1600/mixed.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 200px; height: 195px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/TGMiUFdHwlI/AAAAAAAAAkw/_wDcj4vI-WA/s200/mixed.png" alt="" id="BLOGGER_PHOTO_ID_5504280898033926738" border="0" /&gt;&lt;/a&gt;But now we’ve discovered that our local genome center provides inexpensive indexing of DNA samples (i.e. barcoding).  This means we can obtain 92 clones worth of data for only ~$15K.  Wow!&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.solexa.com/technology/multiplexing_sequencing_assay.ilmn"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 147px; height: 320px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/TGMjbyLHaKI/AAAAAAAAAk4/e7IJKimPGhk/s320/multiplex_sequencing_lg.gif" alt="" id="BLOGGER_PHOTO_ID_5504282129808713890" border="0" /&gt;&lt;/a&gt;So now, I need to collect the clones for sequencing and extract their DNA.  The question is what clones?  I will be sequencing 4 sets of 23 clones (plus a control), so might think of the data collection in that manner.  (This number of indexed clones per pool will be sufficient to obtain ~50-100 fold median coverage per clone, given current estimated yields.)  Three are two things to consider:  (a) technical concerns, and (b) using some clones to look at different kinds of transformations.&lt;br /&gt;&lt;br /&gt;This post will cover the basic technical concerns, and I’ll use the next post to discuss some alternative things we might do with some of this sequencing capacity.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;Replication&lt;/span&gt;&lt;/span&gt;:&lt;br /&gt;To deal with potential criticism later, I should collect the NP-into-Rd transformants from three separate transformation experiments using three separate competent cell preps.  This will allow me to make statements about how reproducible the distribution of recombination tracts between transformations is.  Because each replicate will only have a couple dozen clones, this will likely not be sufficient for detecting subtle differences between the independent replicates, but will be sufficient in determining overall consistency of different transformations&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;Selection for transformants:  &lt;/span&gt;&lt;/span&gt;&lt;br /&gt;Since only a fraction of cells in a competent culture appear to be competent (as measured by co-transformation of unlinked marker loci), I will again select for either the NovR or NalR alleles of gyrB and gyA (respectively) that I’ve already added to the donor NP strain.  This selection step ensures that every sequenced clone has at least one recombination tract, thereby selecting against clones derived from non-competent cells.&lt;br /&gt;&lt;br /&gt;With indexing, it shouldn’t matter how I organize the clones submitted for sequencing, but this data can also be analyzed without paying attention to the index (for which I have my reasons), so it would make sense to organize a plate like this:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/TGMh02j9IWI/AAAAAAAAAko/kp7XdkaJUqg/s1600/scheme.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 169px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/TGMh02j9IWI/AAAAAAAAAko/kp7XdkaJUqg/s320/scheme.png" alt="" id="BLOGGER_PHOTO_ID_5504280361460113762" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;We’d originally decided 64 additional clones (atop the 4 we’ve got) would be sufficient for a basic picture.  Now we can do 92 for much less money.  Should we then do 92 NP-&gt;Rd selected transformants instead of 64, or should we use some of this new sequencing space to investigate new things?&lt;br /&gt;&lt;br /&gt;Doing more clones would give us greater statistical power, but the more we do the more get will get diminishing returns.  Since we estimate that ~60-70 clones will be sufficient for a nice first look, maybe we don’t gain much by doing 92 of the “same” thing and would be served better by using some of the extra space for other endeavours.&lt;br /&gt;&lt;br /&gt;For example, if we reserved one set (23 clones) for one or more other DNA samples, we’d still get 69 clones worth of recombination tracts.  What could we do with this extra space (keeping in mind that these should be transformations that can be done immediately)?&lt;br /&gt;&lt;br /&gt;The most important factor to consider with each of these is whether the sample size (23 clones) would be sufficient to obtain useful data, despite having given up some statistical power for the primary set of 69 clones.  (In my mind this means what we get should publishable as is alone or with the primary dataset, but could also mean several small pilot experiments for “preliminary data”.)&lt;br /&gt;&lt;br /&gt;There are a whole lot of things we might do with this space… (stay tuned)&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-2608638541389501375?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/2608638541389501375/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2010/08/how-to-spend-some-dough.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/2608638541389501375'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/2608638541389501375'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2010/08/how-to-spend-some-dough.html' title='How to spend some dough?'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_7qRGxl6StM4/TGMgwQKuZ0I/AAAAAAAAAkg/S_1VpqD2fkk/s72-c/Five_min_pizza_dough.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-3871630051773839527</id><published>2010-08-10T13:32:00.001-07:00</published><updated>2010-08-10T16:23:40.199-07:00</updated><title type='text'>The anticipation is palpable...</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/TGHebriI9CI/AAAAAAAAAkI/ujVWRsis1AA/s1600/uss.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 231px; height: 320px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/TGHebriI9CI/AAAAAAAAAkI/ujVWRsis1AA/s320/uss.png" alt="" id="BLOGGER_PHOTO_ID_5503924786747208738" border="0" /&gt;&lt;/a&gt;We’re resubmitting a grant soon, in part of which we propose to measure the specificity of DNA uptake in &lt;span style="font-style: italic;"&gt;H. influenzae&lt;/span&gt; for the “USS motif”, a ~28 bp sequence motif that is sufficient to mobilize efficient DNA uptake.  One of the things we want to know is just how the motif lends itself to efficient uptake by competent cells.  In previous work, the lab has shown that point mutations at positions in the motif with very high consensus sometimes affect uptake efficiency, but other times do not.&lt;br /&gt;&lt;br /&gt;Just before leaving for vacation, I managed to submit exciting samples to my friend for sequencing.  These DNA samples are being sequenced AS WE SPEAK!!!!!!!  RIGHT NOW!!!&lt;br /&gt;&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;The two samples were: (1) a complex mix of 200 bp fragments containing a degenerate USS with a 24% chance of a mismatch at each base of a consensus USS 32mer; and (2) an enriched fraction of these fragments taken up by competent cells into the periplasm.&lt;span style=""&gt;  &lt;/span&gt;&lt;a href="http://nodnacontrol.blogspot.com/2010/06/degenerate-uptake-pilot-study.html"&gt;My previous post on my pilot experiment &lt;/a&gt;describes the experiment in more detail with additional links to other posts on the degenerate USS.&lt;br /&gt;&lt;br /&gt;In brief, I am trying to compare a complex input pool of DNA fragments with what is actually taken up by cells to make a new uptake motif, based on the uptake process itself, rather than inferring it from genome sequence analysis.&lt;br /&gt;&lt;br /&gt;This is the uptake saturation experiment I did with the consensus USS (USS-C) and the degenerate USS (USS-24D):&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/TGHegW6YR4I/AAAAAAAAAkQ/x3UfJKK3tPA/s1600/degup.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 200px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/TGHegW6YR4I/AAAAAAAAAkQ/x3UfJKK3tPA/s320/degup.png" alt="" id="BLOGGER_PHOTO_ID_5503924867111077762" border="0" /&gt;&lt;/a&gt;As I've seen before, USS-24D is not taken up as well as USS-C, since there are many suboptimal sequences that are inefficiently taken up.  At very high DNA concentrations, similar amounts of USS-C and USS-24D are taken up.  This is presumably because there are enough optimal fragments to saturate the uptake machinery at very high USS-24D concentrations.&lt;br /&gt;&lt;br /&gt;I purified the periplasmic DNA from three of these USS-24D uptake samples:  SUB (10 ng/ml), MID (70 ng/ml), and SAT (508 ng/ml).  Because the yields were quite low, I used PCR to produce more material from these periplasmic DNA preps.  It took a little effort to optimize this PCR.  Perhaps I will get back to the weird artifacts I discovered at some later point, but suffice it to say, I ensured that I had a clean PCR and did few rounds of PCR, so that I wouldn't alter the complexity of the library too much.&lt;br /&gt;&lt;br /&gt;I then took this periplasmic DNA (which should be enriched for optimal sequences) and re-incubated it with fresh competent cells.  If the experiment worked (so the DNA samples are worth sequencing), then the periplasmic USS-24D prep should be taken up better than the original input.  Indeed this was the case:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/TGHen8rP0ZI/AAAAAAAAAkY/P4eEusZfPPY/s1600/perup.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 286px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/TGHen8rP0ZI/AAAAAAAAAkY/P4eEusZfPPY/s320/perup.png" alt="" id="BLOGGER_PHOTO_ID_5503924997507240338" border="0" /&gt;&lt;/a&gt;Note, this experiment did not take the saturation curve as far out as the original, due to limits on the number of samples I could process in a single experiment and limits on available material before leaving for vacation.&lt;br /&gt;&lt;br /&gt;Unfortunately, I didn't see an appreciable difference between the three different periplasmic preps (SUB, MID, and SAT), which I'd hoped to.  I had originally thought that periplasmic DNA recovered for unsaturated uptake experiments would include more suboptimal sequences, while saturated experiments would yield only optimal sequences.  I hoped that this contrast would allow me to investigate competition between different sequences.  Oh well.&lt;br /&gt;&lt;br /&gt;So I decided to send only two samples for sequencing:  the input and MID.  These are the ones BEING SEQUENCED RIGHT NOW!!! WOO!!!&lt;br /&gt;&lt;br /&gt;It's still possible the different purifications would behave differently at the higher end of the saturation curve (i.e. perhaps SAT would saturate sooner than SUB).  So it's probably worth running some more saturation curves to see if its worth sequencing the other samples.  It'd be really nice to have an actual experimental condition changing to get the most accurate depiction of uptake specificity.&lt;br /&gt;&lt;br /&gt;SO, what do I need to do next?&lt;br /&gt;&lt;br /&gt;(1) &lt;span style="font-style: italic;"&gt;Prepare for the coming data deluge&lt;/span&gt;:  I think I know how to start the data analysis once I've processed the raw read data into a comprehensible ~2e7 rows X 32 columns (though probably in a computationally slow fashion... meh).  What I'm less prepared for is the initial processing of the data.  I designed the experiment to sequence 4 non-degenerate bases upstream and 6 non-degenerate bases downstream of the USS, so I will probably do a simple crude quality filter, where I demand the first 4 and last 6 bases are exactly correct.  This will tend to eliminate poor reads and force the alignment of the 32 degenerate bases in the middle to already be correct.  This filter will exclude erroneous oligo synthesis or fragment construction that introduced indels into the USS, which will simplify things initially.  At this point, I will also need to apply a base quality filter to ensure I ignore base calls that have low confidence.  Sequencing error is a problem for this analysis, so its important I use stringent filters.  Even if I only ended up with 10% of the raw reads, I'd have an enormous amount of data for sequence motif analysis.&lt;br /&gt;&lt;br /&gt;(2) &lt;span style="font-style: italic;"&gt;Prepare for abject failure.&lt;/span&gt;  It's possible I screwed up the design of the constructs or that there is some unanticipated challenge sequencing through those 32 bases or using this approach.  I'll know in a few days, but need to think about what the next step would be, if indeed I misplaced some bases in my reverse engineering scheme.&lt;br /&gt;&lt;br /&gt;(3) &lt;span style="font-style: italic;"&gt;Prepare for a seeming failure that isn't really.&lt;/span&gt;  I may have screwed nothing up but get back data from Illumina's pipeline that says something's terribly wrong.  I don't fully understand the details of this, but the base-calling step in Illumina's pipeline (which is reading the raw image files from each sequencing step) may be screwball, because of the extremely skewed base composition my constructs will have during each cycle.  E.g. the first base for every single cluster should be "A", whereas 76% of clusters should have "A" at the fifth base.  Apparently, this can create some weird artifacts in the data processing step, which I need to be prepared for (and not despair when I find out "it didn't work".  Mostly this will involve working with my friend and his sequencing facility to re-run the base-calling with an alternate pipeline.&lt;br /&gt;&lt;br /&gt;(4) &lt;span style="font-style: italic;"&gt;Work on updating the grant for re-submission.&lt;/span&gt;  There are several paragraphs of our grant application that need to be modified to show our progress.  To some extent, this will involve the soon-to-come data, but I can begin by identifying the parts that will need changes an including some different figures.&lt;br /&gt;&lt;br /&gt;(5) &lt;span style="font-style: italic;"&gt;Work out the molecular biology.&lt;/span&gt;  I've done a bunch of uptake experiments in a bunch of settings, and I should now have enough to put together some kind of little story, even without the sequence data.  With a mind towards a paper, I need to work out just what I need to do.  Do I need to do another sequencing experiment with a different construct (more or less degenerate) or under different conditions?  Or is a single periplasmic enrichment enough?  If the latter, i will certainly want to show a bunch of other experimental data, but which experiments?  And which need to be repeated.  My first step in this direction is to go through my notebook and figure out what I have...&lt;br /&gt;&lt;br /&gt;(6) &lt;span style="font-style: italic;"&gt;Work out a large-scale periplasmic prep.  &lt;/span&gt;I circumvented this for the degenerate USS experiment by doing PCR.  SInce I know my exact construct, I could use PCR to amplify it and didn't need to have a good yield, nor particularly pure of a prep (since genomic DNA won't amplify using the construct's primers).  However, if I want to look at uptake across a chromosome, I will need to fractionate periplasmic DNA away from chromosomal DNA both to a high level of purity and with a high yield.  I've accomplished each of these individually, but so far have not managed a large-scale periplasmic prep that leaves me with enough pure DNA to make sequencing libraries reliably.  I refuse to use whole-genome amplification for this experiment. &lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-3871630051773839527?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/3871630051773839527/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2010/08/anticipation-is-palpable.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/3871630051773839527'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/3871630051773839527'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2010/08/anticipation-is-palpable.html' title='The anticipation is palpable...'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_7qRGxl6StM4/TGHebriI9CI/AAAAAAAAAkI/ujVWRsis1AA/s72-c/uss.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-5002174927740280324</id><published>2010-08-10T13:18:00.000-07:00</published><updated>2010-08-10T13:29:09.349-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='vacation'/><title type='text'>Hi blog!</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://blog.baliwww.com/arts-culture/1383"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 268px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/TGG0c9vGSwI/AAAAAAAAAjI/TlyzPOSLvOo/s400/baris.jpg" alt="" id="BLOGGER_PHOTO_ID_5503878629324901122" border="0" /&gt;&lt;/a&gt;So I’ve returned to work after a 3.5 week vacation to Bali!  Woo!  Time to  catch up on some long non-blogging.  But before that...&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;Here’s where my crazy Uncle Jimmy took me and the missus for a few days… Gili Gede (meaning “Little Island Big”) off the coast of Lombok (which appears to be experiencing a gold-rush, so go now before its all built up!).&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/TGG01YuEp-I/AAAAAAAAAjQ/UI6zyTOge0s/s1600/giligede.jpg"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 368px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/TGG01YuEp-I/AAAAAAAAAjQ/UI6zyTOge0s/s400/giligede.jpg" alt="" id="BLOGGER_PHOTO_ID_5503879048885217250" border="0" /&gt;&lt;/a&gt;The ferry between Bali and Lombok took us across the legendary "&lt;a href="http://en.wikipedia.org/wiki/Wallace_Line"&gt;Wallace Line&lt;/a&gt;", though my total ignorance of systematics means I couldn't really tell the difference in flora and fauna between the islands.  (Though we did see a HUGE monitor lizard near a brick-making operation.)&lt;br /&gt;&lt;br /&gt;I stayed out at the end of this pier on the lower left at the “Secret Island Resort”.  (Note the vessel that took us snorkeling to the immediate South... the "Scorpio".&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/TGG1DejiEoI/AAAAAAAAAjY/aiXcTpom_kA/s1600/rockydocky.jpg"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 368px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/TGG1DejiEoI/AAAAAAAAAjY/aiXcTpom_kA/s400/rockydocky.jpg" alt="" id="BLOGGER_PHOTO_ID_5503879290969789058" border="0" /&gt;&lt;/a&gt;And here's an actual  shot of the infamous "Rocky Docky" with the lovely Heather in the foreground:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/TGG1OsOJByI/AAAAAAAAAjg/0yZFiGPVmSQ/s1600/rocks.jpg"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 300px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/TGG1OsOJByI/AAAAAAAAAjg/0yZFiGPVmSQ/s400/rocks.jpg" alt="" id="BLOGGER_PHOTO_ID_5503879483616724770" border="0" /&gt;&lt;/a&gt;The reef was right over the edge of the pier, so snorkeling to see all the corals and fishies was optional.  You could just look over the edge!  Excellent!&lt;br /&gt;&lt;br /&gt;Anyways, I've been getting my head back into things and am writing a few blog posts about what’s going on in my world-of-science.  So expect a deluge of catch-up posts during the next day or two…&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-5002174927740280324?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/5002174927740280324/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2010/08/hi-blog.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/5002174927740280324'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/5002174927740280324'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2010/08/hi-blog.html' title='Hi blog!'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_7qRGxl6StM4/TGG0c9vGSwI/AAAAAAAAAjI/TlyzPOSLvOo/s72-c/baris.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-4611675389157612820</id><published>2010-06-22T09:17:00.000-07:00</published><updated>2010-06-22T09:29:04.316-07:00</updated><title type='text'>Manuscript plans?</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://kieljohnson.com/kieljohnson.com/publish_or_perish.html"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 270px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/TCDkmv7QL7I/AAAAAAAAAjA/uVf-MRzyHa4/s400/perish.jpg" alt="" id="BLOGGER_PHOTO_ID_5485635700488417202" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;So I’m a bit over one year into my postdoc.  What have I got to show for myself?  Well, plenty of work, but not any papers, or even written manuscripts, so that’s a bit of a problem.  Can I turn my first set of genome sequencing data into a manuscript?&lt;br /&gt;&lt;br /&gt;Seems likely.  I collected &gt;5 Gigabases of Illumina sequence data from several &lt;span style="font-style: italic;"&gt;Haemophilus influenzae&lt;/span&gt; chromosomes, and this could be used as the basis of a manuscript.  I obtained data from a donor strain (86-028NP NovR NalR) and a recipient strain (Rd, RR722) as control data (in order to evaluate the ability of the sequencing and read alignment to correctly identify polymorphisms).  I also obtained data from two individual transformants and a pool of four transformants to identify donor alleles in transformed recipient chromosomes.  I even found some things out.&lt;br /&gt;&lt;br /&gt;Does this a paper make?  One outstanding issue is that, in spite of being a lot of data, which has required a fair amount of work to get a handle on, there is not a tremendous amount of biologically relevant data.  Yes, I obtained extremely accurate and comprehensive data for the four transformants sequenced.  But it was still only four transformants.  There are some biologically meaningful results; they just aren’t terribly novel or statistically robust. The bigger biologically meaningful results will have to wait until we can collect more data.&lt;br /&gt;&lt;br /&gt;So to turn this into something publishable, the approach and method need to be important enough (and made explicit enough) to be of value to others.  So far, I have not done anything in my analysis that is truly novel, but I have managed to produce the bare-bones of a “pipeline” for measuring allele frequencies from pools, and identifying recombination tracts in transformants.  The data we got was also extremely high coverage, so we were able to see the limits of the technology fairly well: i.e. depth-of-coverage variation, errors, and issues with read alignment.&lt;br /&gt;&lt;br /&gt;Though everything I’ve done so far uses “off-the-shelf” bioinformatics tools, there are so many people trying to do similar things, it might be useful to write a paper that is sort of  an “application” of the technology and tools I’ve been using.  It took me months to piece everything together, so maybe I could save someone else some time by having everything in one place.  But with each passing day, the value of such a paper is probably diminishing, so I’d best get started!&lt;br /&gt;&lt;br /&gt;There are still a few analyses I’d like to do that would give the paper a little more spice:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Structural variant analysis:  This is something that will involve our collaborators at UVA, who are experts.  We can see these pretty well (at least the larger ones), but something systematic has yet to be done.&lt;/li&gt;&lt;li&gt;Reciprocal read mapping: I’ve mapped all the data to both the donor and recipient genomes, but I have not really fully leveraged this fact.  The read alignment artifacts that arise mapping data from one strain onto the other could be handled much better, if I was able to assign individual reads to either of the two reference genomes, based on the mapping quality.  I’d really like to do this.  It’d be novel, mainly because most people sequencing are doing SNP discovery.  I already have all my SNPs discovered, so doing an extra good job at calling SNP frequencies using reciprocal alignment would be at least something new.  This will take a bit of work, however, and I’ll need to figure out the best computational way to do it.  Aside from doing uber-detailed error analysis for a technical paper, I think this is really the best chance to make a novel contribution bioinformatically.&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;Here is a rough outline of the manuscript, as I’m viewing it so far:&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;1. Introduction&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Many bacteria become naturally competent.  Natural transformation is important in evolution.&lt;/li&gt;&lt;li&gt;Previous sequencing studies have focused on only a handful of defined constructs.  (&lt;span style="font-style: italic;"&gt;Bacillus&lt;/span&gt;, &lt;span style="font-style: italic;"&gt;Helicobacter&lt;/span&gt;, &lt;span style="font-style: italic;"&gt;Actinobacillus&lt;/span&gt;).&lt;/li&gt;&lt;li&gt;For any organism, the total extent of recombined fragments in individual transformants has never been directly evaluated, and the factors dictating the chance of transformation are only poorly understood as a result.&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Haemophilus influenzae&lt;/span&gt; is a model system for natural transformation.  The mechanism is well-defined.  Transformation is efficient in the lab strain.&lt;/li&gt;&lt;li&gt;The extensive natural genetic variation between &lt;span style="font-style: italic;"&gt;H. influenzae&lt;/span&gt; strains provides tens of thousands of markers to identify recombination tracts in individual transformants.  Not only single-nucleotide, but structural variation in the form of indels and other rearrangments.  The “supragenome” hypothesis.&lt;/li&gt;&lt;li&gt;We investigated the use of massively parallel sequencing (or “next generation sequencing”, NGS) to characterize natural transformation at a whole-genome scale.  &lt;/li&gt;&lt;li&gt;Our results show the Illumina platform to be an excellent method to obtain nearly exhaustive information on recombination tracts in individual transformants. Our approach uses the alignment of sequence reads to both donor and recipient reference sequences.  We obtained donor and recipient genome sequence as controls for evaluating sequencing error, depth of coverage, and polymorphism identification.  We also obtained the sequence data from two individual transformants and a pool of four transformants.&lt;/li&gt;&lt;li&gt;Individual recombination tracts are longer than previously appreciated and can bring hundreds of polymorphisms from donor to recipient chromosomes (both single-nucleotide, insertion, deletion, and insertional deletion).  However, recombination tracts often appear interrupted by or terminated at sites of structural variation between the two genomes.  This shows that such variation are barriers to strand exchange and/or are preferred mismatch repair substrates.&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold;"&gt;2. Materials and Methods&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Strains&lt;/li&gt;&lt;li&gt;DNA &lt;/li&gt;&lt;li&gt;Transformations&lt;/li&gt;&lt;li&gt;Library preparation&lt;/li&gt;&lt;li&gt;Illumina sequencing and initial data processing pipeline&lt;/li&gt;&lt;li&gt;Reference genome alignment by MUMmer, MAUVE&lt;/li&gt;&lt;li&gt;Reciprocal read alignment by BWA&lt;/li&gt;&lt;li&gt;SAMtools pileup&lt;/li&gt;&lt;li&gt;Galaxy pileup parser&lt;/li&gt;&lt;li&gt;Variant frequency analysis&lt;/li&gt;&lt;li&gt;Assignment of reads (unimplemented)&lt;/li&gt;&lt;li&gt;Donor segment calling&lt;/li&gt;&lt;li&gt;Analysis of structural variation by HYDRA&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold;"&gt;3. Results&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Genetic transformation of competent cells:  Marker to marker variation.  Dependence on sequence identity.  Congression and linkage&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Illumina sequencing and read alignment:  Table of sequencing results and fraction of mapped reads.  Variation in depth-of-coverage.  Sources of sequencing error and read mapping artifacts. Reciprocal read alignment?  Varying alignment stringency?&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Comparison of donor and recipient strains.  Identification of SNPs and structural variants between the donor and recipient strains.  Comparison to whole-genome alignment methods.&lt;/li&gt;&lt;li&gt;Identification of donor alleles in transformed recipient chromosomes.  Accounting for SV alleles.  Identifying novel alleles.&lt;/li&gt;&lt;li&gt;Identification of allele frequencies in a pool of four transformants.&lt;/li&gt;&lt;li&gt;Identification of donor segments and putative recombination tracts&lt;/li&gt;&lt;li&gt;Enrichment of SVs at donor segment breakpoints&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold;"&gt;4. Discussion&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;A first look at transformation… still few transformants&lt;/li&gt;&lt;li&gt;Excellent method.  Limitations are circumvented by very high coverage, knowledge of both donor and recipient genome sequences, and the use of reciprocal read alignment (unimplemented)&lt;/li&gt;&lt;li&gt;Big recombination tracts.  Evidence of mismatch repair.  SVs as blocks to recombination tract progression.&lt;/li&gt;&lt;li&gt;Speculations:  Hotspots?  Role of uptake specificity?  Supragenome transfer?&lt;/li&gt;&lt;li&gt;Future:  Aside from collecting more transformants, making a transformation frequency map to investigate the “cis-acting” factors controlling the efficiency of transformation.  Long-term utility in understanding the population genetics of human pathogens. &lt;/li&gt;&lt;/ul&gt;Still a rough outline, but something to start with....&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-4611675389157612820?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/4611675389157612820/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2010/06/manuscript-plans.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/4611675389157612820'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/4611675389157612820'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2010/06/manuscript-plans.html' title='Manuscript plans?'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_7qRGxl6StM4/TCDkmv7QL7I/AAAAAAAAAjA/uVf-MRzyHa4/s72-c/perish.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-7961943982467105417</id><published>2010-06-11T03:48:00.000-07:00</published><updated>2010-06-11T12:29:56.145-07:00</updated><title type='text'>Degenerate Uptake: Pilot Study</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.flickr.com/photos/wheatfields/2074121298/"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 240px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/TBKNtKR0DDI/AAAAAAAAAiw/0hXpkMCieCE/s320/2074121298_9cd5285a31.jpg" alt="" id="BLOGGER_PHOTO_ID_5481599503456013362" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Something to blog about! (Wow; it's been over a month... sorry to my three loyal blog readers.)&lt;br /&gt;&lt;br /&gt;I’ve gotten around to doing a pilot-scale experiment on the specificity of H. influenzae DNA uptake for the “uptake signal sequence” (USS).  The USS is a ~29 base pair motif highly abundant in the H. influenzae genome, and sites that match the consensus USS are known to be preferred substrates for DNA uptake by competent cells.  The presence of many USS in the chromosome is presumed to be why H. influenzae competent cells prefer H. influenzae DNA over DNA from other organisms.&lt;br /&gt;&lt;br /&gt;However, little is known about how the structure of USS contributes to uptake of USS-containing fragments:  Limited analyses of mutations of a DNA fragment containing a consensus USS suggests that some but not all informative positions in the USS motif are important to uptake, indicating that other forces (perhaps later steps in transformation) contribute to the structure of the USS motif.&lt;br /&gt;&lt;br /&gt;To carefully dissect uptake specificity for the USS motif, we have devised an enrichment experiment:&lt;br /&gt;(1) A complex pool of DNA fragments containing a degenerate USS library is incubated with competent cells.&lt;br /&gt;(2) The fragments preferentially taken up by cells are purified from the periplasm.&lt;br /&gt;(3) DNA sequencing is used to compare the input and periplasm-purified pools of sequences.&lt;br /&gt;&lt;br /&gt;Details and Pilot-scale Results:&lt;br /&gt;&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;I’ve previously discussed the design of the input DNA pools.  The control 200 bp construct is designed to &lt;a href="http://nodnacontrol.blogspot.com/2009/09/reverse-engineering.html"&gt;already contain the sequences needed for Illumina single-end sequencing&lt;/a&gt;, along with a 32 bp consensus USS site near the middle of the fragment.  The test construct is the same, &lt;a href="http://nodnacontrol.blogspot.com/2009/05/rosie-and-i-have-been-working-our-way.html"&gt;except the USS is degenerate&lt;/a&gt;, having a 24% chance of a non-consensus base at each position.  Thus in the degenerate-USS pool, the average site has ~7-8 mismatches from the consensus sequence.&lt;br /&gt;&lt;a href="http://nodnacontrol.blogspot.com/2009/10/corrected-logos.html"&gt;&lt;br /&gt;The expectation is that&lt;/a&gt;, while the consensus-USS construct (USS-C) will be taken up by cells well, the degenerate-USS construct (USS-D) will be taken up more poorly, since it contains many suboptimal sequences (i.e. it is less uniformly delicious).  Indeed this is the case, with USS-C being taken up about 10 times better than USS-D at sub-saturating DNA concentrations (see below).  The notion is that comparing the USS-D input to that taken up by cells will provide a precise measurement of uptake specificity for the USS (i.e. which sequences are tastiest).  We think this will tell us a lot about the mechanism of uptake.&lt;br /&gt;&lt;br /&gt;It occurred to me a couple weeks ago that before moving on to the data collection (i.e. the DNA sequencing), I should first make sure that the USS-D fragments recovered from the periplasmic purification are taken up better than the original USS-D input (i.e. the competent cells selected more delicious sequences).  This would provide the clearest indication that the experiment worked and the material is worth sequencing.  It is!&lt;br /&gt;&lt;br /&gt;I compared the uptake of USS-C and USS-D before and after periplasmic purification of taken up DNA from rec-2 competent cells across a range of DNA concentrations.  Here are the results:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/TBKOb6jEFoI/AAAAAAAAAi4/iSjswPlDbP0/s1600/uptaken.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 393px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/TBKOb6jEFoI/AAAAAAAAAi4/iSjswPlDbP0/s400/uptaken.png" alt="" id="BLOGGER_PHOTO_ID_5481600306687252098" border="0" /&gt;&lt;/a&gt;A and B show the % DNA uptake for USS-C and USS-D, respectively, for different amounts of added DNA (to 200 ml competent cultures).  C and D show the same data: C is a dose-response curve, and D is a double-reciprocal plot (since I used 2 ng of hot label, along with an additional amount of cold label for these experiments).&lt;br /&gt;&lt;br /&gt;Input USS-C and periplasm-purified USS-C were quite similar, while periplasm-purified USS-D was taken up substantially better than input USS-D.&lt;br /&gt;&lt;br /&gt;Notably, at low (sub-saturating) concentrations of DNA, periplasm-purified USS-D is taken up less well than USS-C, while at high (saturating) concentrations similar amounts of DNA are taken up.  Also of note is that the input USS-D does not saturate until higher concentrations than the other three samples.&lt;br /&gt;&lt;br /&gt;This is all good news.  I left out a fair number of details, but this pilot-scale experiments is extremely encouraging.  Next week, I plan to repeat the experiment, but this time on an appropriate scale for recovering samples for sequencing.  I will also investigate how periplasm-purified USS-D samples behave when recovered from uptake experiments with varying amounts of DNA.  I expect that at sub-saturating concentration, the cells will be less “picky”, such that periplasm-purified USS-D will be taken up less well than that purified from saturating concentration.  This would provide a useful experimental condition, as in the sequence analysis we would be able to investigate the role of competition in shaping USS specificity.&lt;br /&gt;&lt;br /&gt;I think this might end up working swimmingly...  Onward!&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-7961943982467105417?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/7961943982467105417/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2010/06/degenerate-uptake-pilot-study.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/7961943982467105417'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/7961943982467105417'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2010/06/degenerate-uptake-pilot-study.html' title='Degenerate Uptake: Pilot Study'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_7qRGxl6StM4/TBKNtKR0DDI/AAAAAAAAAiw/0hXpkMCieCE/s72-c/2074121298_9cd5285a31.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-3502759238938405895</id><published>2010-04-13T12:35:00.000-07:00</published><updated>2010-04-13T12:42:52.596-07:00</updated><title type='text'>Mining some old array data</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/S8TIqEtI6bI/AAAAAAAAAio/uk25DURQrBk/s1600/NONPTSpur.png"&gt;&lt;br /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.molecularstructure.org/entry.php?pdb=2G9C"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 250px; height: 250px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/S8THwq4g5wI/AAAAAAAAAh4/N0ARfMJsSpM/s320/2G9C.jpg" alt="" id="BLOGGER_PHOTO_ID_5459708287238858498" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;So in an effort to re-examine some of the lab’s old array data, I made a fairly simple R script to plot the change in expression of competence genes, putative purR-regulated genes, and genes involved in utilizing secondary sugars.  We no longer have our expensive license for fancy-pants software, but all I needed to do was some arithematic to columns, and then find the rows of interest, so it’s R-tastic!&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;I looked at a time course dataset, in which expression was monitored over the course of growth in sBHI and after transfer to MIV.  I also looked at a single one-off array comparing purR- to purR+ strains growing in late-log +cAMP.&lt;br /&gt;&lt;br /&gt;Here’s the results for the time course.  All values are normalized to the first time point.  Blue are sBHI timepoints, and red are MIV timepoints.  MIV cultures were split from the sBHI cultures at t=0 minutes.&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/S8TIU7PBANI/AAAAAAAAAiA/7LHX7aiKloY/s1600/compTC.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 161px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/S8TIU7PBANI/AAAAAAAAAiA/7LHX7aiKloY/s320/compTC.png" alt="" id="BLOGGER_PHOTO_ID_5459708910103494866" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/S8TIY_uC9NI/AAAAAAAAAiI/VJklKLCkjWs/s1600/purTC.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 158px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/S8TIY_uC9NI/AAAAAAAAAiI/VJklKLCkjWs/s320/purTC.png" alt="" id="BLOGGER_PHOTO_ID_5459708980026864850" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/S8TIc4r1x7I/AAAAAAAAAiQ/Y64SgtcFRwU/s1600/nonPTSTC.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 163px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/S8TIc4r1x7I/AAAAAAAAAiQ/Y64SgtcFRwU/s320/nonPTSTC.png" alt="" id="BLOGGER_PHOTO_ID_5459709046858041266" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;It’s pretty clear that the competence genes are strongly induced in MIV, but are also induced in late-log phase, as expected.  Putative PurR-regulated genes are strongly and quickly induced in MIV, indicating that purine pools are quickly depleted, and the purine biosynthetic pathway is activated quite quickly (much faster than the competence genes, it appears).  The “non-PTS” genes (several genes induced by CRP when cAMP levels are high) appear to be briefly weakly induced in MIV, as well as being weakly induced in late-log.&lt;br /&gt;&lt;br /&gt;Here’s the same sets of genes plotted as the ratio of expression in purR- vs purR+ cultures (late-log, induced with cAMP).  Here, I plot the ratios from both array elements for each gene (open and closed circles) and colored them just so they’d be easy to see.  Also note, I normalized everything to the median ratio to account for dye effects (under the assumption that the median gene is not PurR regulated).  Again, strong induction of the putative purine-regulated genes, a weak repression of the competence genes (presumably due to purine repression), and not much happening with the non-PTS sugars.&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/S8TIgzDvmsI/AAAAAAAAAiY/nZHbHxVCOp8/s1600/COMPpur.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 154px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/S8TIgzDvmsI/AAAAAAAAAiY/nZHbHxVCOp8/s320/COMPpur.png" alt="" id="BLOGGER_PHOTO_ID_5459709114067163842" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/S8TIlyFJ2fI/AAAAAAAAAig/K0_h7stH61g/s1600/PURpur.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 160px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/S8TIlyFJ2fI/AAAAAAAAAig/K0_h7stH61g/s320/PURpur.png" alt="" id="BLOGGER_PHOTO_ID_5459709199703988722" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/S8TIqEtI6bI/AAAAAAAAAio/uk25DURQrBk/s1600/NONPTSpur.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 160px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/S8TIqEtI6bI/AAAAAAAAAio/uk25DURQrBk/s320/NONPTSpur.png" alt="" id="BLOGGER_PHOTO_ID_5459709273423014322" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Conclusion:  Nothing we didn’t already suspect, but it’s good to see that things are behaved as expected.  One point of note is that the hypothesized regulation of rec2 by PurR isn’t something that jumps out of this, but if purine repression acts upstream of rec2, we wouldn’t be able to see the effects of deleting PurR here anyways…&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-3502759238938405895?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/3502759238938405895/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2010/04/mining-some-old-array-data.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/3502759238938405895'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/3502759238938405895'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2010/04/mining-some-old-array-data.html' title='Mining some old array data'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_7qRGxl6StM4/S8THwq4g5wI/AAAAAAAAAh4/N0ARfMJsSpM/s72-c/2G9C.jpg' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-5835115559555172897</id><published>2010-04-02T13:25:00.001-07:00</published><updated>2010-04-02T13:34:08.299-07:00</updated><title type='text'>SNP densities</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.genetics.ucla.edu/labs/sabatti/research.html"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 320px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/S7ZS2q5fFII/AAAAAAAAAhI/ecPl_s3zKmQ/s320/snp.jpg" alt="" id="BLOGGER_PHOTO_ID_5455639097787749506" border="0" /&gt;&lt;/a&gt;So I’ve been writing yet another grant, which has been distracting me from blogging (this isn't supposed to be a monthly blog, but this will hopefully be the last grant application for a while).&lt;br /&gt;&lt;br /&gt;But I’ve also been doing several analyses lately.  Here’s one.  I took the sequences of an ~300 kb restriction fragment from three H. influenzae isolates (Rd, 86-028NP, and PittGG).  They’re all similarly divergent from each other (~2.5%), and I wondered how well the level of divergence of Rd vs NP and Rd vs GG correlated along the chromosome... &lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;So I aligned the sequences in &lt;a href="http://asap.ahabs.wisc.edu/mauve/"&gt;Mauve&lt;/a&gt;, took its SNP calling output, and did a couple simple sliding window analyses inside R (using the &lt;a href="http://cran.r-project.org/web/packages/zoo/index.html"&gt;zoo&lt;/a&gt; package for &lt;a href="http://rss.acs.unt.edu/Rdoc/library/zoo/html/rollapply.html"&gt;rolling means&lt;/a&gt;).  Here’s what divergence looked like averaged over 5 kb windows (click to enlarge):&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/S7ZTGobj9BI/AAAAAAAAAhQ/qLgKp4Ao9pg/s1600/denseALL.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 85px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/S7ZTGobj9BI/AAAAAAAAAhQ/qLgKp4Ao9pg/s320/denseALL.png" alt="" id="BLOGGER_PHOTO_ID_5455639372003275794" border="0" /&gt;&lt;/a&gt;The divergence between Rd and the two other isolates are quite well correlated (r2= 0.8, using &lt;a href="http://www.biostat.jhsph.edu/%7Eqli/biostatistics_r_doc/library/stats/html/lm.html"&gt;linear modeling&lt;/a&gt;).  But since NP and GG are similarly divergent, I made two other plots.&lt;br /&gt;&lt;br /&gt;First, here’s a comparison of the density of SNPs that are shared by NP and GG and those that are unique to either NP or GG:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/S7ZTR5HVffI/AAAAAAAAAhg/2RX53sOubxE/s1600/densCOMB.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 84px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/S7ZTR5HVffI/AAAAAAAAAhg/2RX53sOubxE/s320/densCOMB.png" alt="" id="BLOGGER_PHOTO_ID_5455639565460405746" border="0" /&gt;&lt;/a&gt;The correlation is a lot worse (r2=0.4).&lt;br /&gt;&lt;br /&gt;And if I further break the “unshared” line into NP and GG-specific SNPs (i.e. positions are different between Rd and NP but not GG, and vice versa).&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/S7ZTVOV8KkI/AAAAAAAAAho/1ecOkyNvMh4/s1600/densSPEC.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 79px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/S7ZTVOV8KkI/AAAAAAAAAho/1ecOkyNvMh4/s320/densSPEC.png" alt="" id="BLOGGER_PHOTO_ID_5455639622698412610" border="0" /&gt;&lt;/a&gt;The correlation is worse still (r2=0.2)&lt;br /&gt;&lt;br /&gt;Similar results applied to smaller windows, but the plots looked a lot messier.  Note that it’s not exactly totally straightforward to measure SNP density... What does one do at indels??  I just ignored them, so the results above are rough.  Part of the reason I focused on only a co-linear segment of chromosome was to minimize this problem, but there are still several indels between each of the three strains.&lt;br /&gt;&lt;br /&gt;Indels aside, what’s this mean?  One of the goals of my transformation frequency mapping is to be able to distinguish the effects of sequence divergence on transformation from the effects of other local chromosomal properties (base composition, sequence motifs, etc.).  Since NP and GG have correlated SNP densities relative to Rd, transformation frequencies across the Rd chromosome are expected to also be correlated.  Discrepencies in transformation frequency by NP and GG donors could indicate that SNPs specific to the isolates are somehow modulating transformation independent of divergence per se.&lt;br /&gt;&lt;br /&gt;Distinguishing chromosome “position effects” from sequence divergence will probably require a third donor DNA.  Deciding what this would be requires some thought.  All of the sequence H. influenzae are similarly divergent from Rd (and for the most part each other), and phylogeny poorly distinguishes separate clades (i.e. they kind of give a star phylogeny).&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0005854"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 304px; height: 308px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/S7ZTpWSPeFI/AAAAAAAAAhw/PmTX1WG2Wvg/s320/MaughanReticulated.png" alt="" id="BLOGGER_PHOTO_ID_5455639968427767890" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;So I should use either a strain much more closely related to Rd or one more distantly related (perhaps another species).  Using a closely related strain has the advantage that transformation frequencies are expected to be higher and divergence will play less of a role, making the focus more on divergence-independent factors, but I would also have far fewer markers.&lt;br /&gt;&lt;br /&gt;Based on MLST comparisons, several strains are sisters of Rd (RM7033, RM7429, RM7271).  These assignments are made in several phylogenetic and put the three at ~0.5% divergent from Rd.  So I would expect that RM7033 (for example) would have ~6000 SNPs from Rd (far more than our Rd or the other sequenced Rd), ample to have markers across the chromosome...&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-5835115559555172897?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/5835115559555172897/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2010/04/snp-densities.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/5835115559555172897'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/5835115559555172897'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2010/04/snp-densities.html' title='SNP densities'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_7qRGxl6StM4/S7ZS2q5fFII/AAAAAAAAAhI/ecPl_s3zKmQ/s72-c/snp.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-2949541144758680147</id><published>2010-03-16T10:25:00.000-07:00</published><updated>2010-03-16T17:58:38.459-07:00</updated><title type='text'>Chromosome "Position Effect"</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.ncbi.nlm.nih.gov/pubmed/17246266"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 183px; height: 220px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/S5_Al5LBCwI/AAAAAAAAAgo/NxyXEYkjBL8/s320/w_var_eye.jpg" alt="" id="BLOGGER_PHOTO_ID_5449285831376308994" border="0" /&gt;&lt;/a&gt;I got some  data for the transformation frequency at five different markers.  They varied.  This isn’t anything ground-breaking; reports of a "position effect"  for transformation go back decades and a couple of recent studies in other organisms bear it out.  The underlying cause of variation in transformation rate at different positions likely stem from two sources:  the physical structure of the chromosome and the sequence composition of the recombination substrates.  The former case is reasonably well-worked out for analogous processes in eukaryotes:  For example, in yeast, heterochromatic regions are recalcitrant to recombination, but these sites become recombinogenic in mutants with defective heterochromatin assembly.  Sequence composition has also been shown to affect the efficiency of recombinational strand exchange in several different contexts, both genetic and biochemical.  This latter type of variation is not traditionally considered a "position effect", but is difficult to distinguish from the former.&lt;br /&gt;&lt;br /&gt;Anyways, I wanted preliminary data showing that I can, in fact, detect differences in transformation at different genomic positions, since a big part of my proposed work will involve measuring to very high resolution this position effect...&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;I used MAP7 donor DNA to transform three independent Rd competent cell preps.  MAP7 is highly similar to Rd, except that it carries several point mutations that confer antibiotic resistance.  There are likely other unselected differences between Rd and MAP7, but  few.  Thus, differences between marker transformation rates are likely to predominantly reflect chromosome position effects, rather than sequence divergence between donor and recipient.&lt;br /&gt;&lt;br /&gt;This latter point isn’t strictly true:  in order to see transformation, a genetic change has to be made, and the selected MAP7 point mutations are genetic differences.  But because in our preliminary sequencing data, we saw long stretches of donor-specific DNA with dozens to hundreds of SNPs, I don’t think these single-nucleotide differences are contributing too hugely to the observed variation in transformation rate.&lt;br /&gt;&lt;br /&gt;Here’s the data for the five markers individually.  Vertical bars indicate the mean transformation frequency per viable cell to the indicated antibiotic resistance allele.  The inset circle shows a rough map of the location of the MAP7 markers. (Sorry about the lack of an origin.)&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/S5_AycmOm1I/AAAAAAAAAgw/HxA9sNI68mA/s1600-h/tfcfu.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 235px; height: 320px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/S5_AycmOm1I/AAAAAAAAAgw/HxA9sNI68mA/s320/tfcfu.png" alt="" id="BLOGGER_PHOTO_ID_5449286047044115282" border="0" /&gt;&lt;/a&gt;Indeed, I see a ~5-fold range of transformation frequencies, from ~1/500 to ~1/100.  Since this is only an arbitrary sampling of five sites, the range of variation across the chromosome could be much higher.&lt;br /&gt;&lt;br /&gt;As previously discussed, these values  underestimate the transformation frequency per competent cell.  Competent cultures typically have both competent and non-competent cells, and the “fraction competence” is typically measured by looking at co-transformation frequencies.  These are often higher than expected, even for unlinked markers, a phenomenon termed “congression” and interpreted as a binary distinction between competent and non-competent cells in the culture.&lt;br /&gt;&lt;br /&gt;The technical value of this is that I can elevate the observed transformation frequency at one locus by selecting for transformation at another unlinked locus, since this eliminates all non-competent cells from the culture, providing potentially higher sensitivity on our proposed sequencing experiments.  It also dampens differences in culture-to-culture variation caused by big differences in fraction competence (not shown).&lt;br /&gt;&lt;br /&gt;However, there is at least &lt;a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1212875/"&gt;one old report&lt;/a&gt; using Bacillus that suggests congression does not simply reflect a binary distinction between competent and non-competent cells .  If cells only came in those two flavors, we would predict that any pair of unlinked markers would show the same level of congression, yet they report that different pairs of markers had different congression frequencies (aside from due to linkage).  They go on to suggest an interesting model for their observations, but my concern is more technical:&lt;br /&gt;&lt;br /&gt;Does selecting for transformation at different loci affect the tranformation rate at a second unlinked locus?&lt;br /&gt;&lt;br /&gt;If the answer is yes, then selection for transformants at a locus would be a poor way to elevate the transformation rate at other unlinked loci, since it would be biased in an unknown way.  I also measured co-transformation of Nal resistance and each of the other four.  Nal is “unlinked” from all the others (i.e. DNA fragments from standard DNA preps will always be too short to contain the NalR allele with another antibiotic resistance allele), so I can measure “congression” four times.&lt;br /&gt;&lt;br /&gt;Here is the data.  So, for example, the first bar was calculated as:  f(kanR nalR) / f(kanR).  This normalizes each bar to the nalR rate (i.e. “the frequency of nalR among kanR transformants”).&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/S5_A-6EqYKI/AAAAAAAAAg4/krdJgCPqUoA/s1600-h/nalcomp.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 155px; height: 320px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/S5_A-6EqYKI/AAAAAAAAAg4/krdJgCPqUoA/s320/nalcomp.png" alt="" id="BLOGGER_PHOTO_ID_5449286261114822818" border="0" /&gt;&lt;/a&gt;The first thing to note is that the scale bar has changed relative to the transformation/cfu.  For each of the 3 cultures, there was ~3.5 fold increase in the observed transformation rate, which would be expected if ~1/3 of cells in each culture were competent.&lt;br /&gt;&lt;br /&gt;The second thing to note is that selecting for any of the four markers had no effect on the NalR transformation frequency.  So the answer to the above question is no.  Phew!  The Bacillus result was cool, but I’m glad it isn’t the case here.  A binary competent/non-competent model is perfectly reasonable in our system (though this does not exclude the possibility of variation among competent cells).  With this in hand, I can now plot the co-transformation data with respect to NalR.  If selecting for NalR only eliminates non-competent cells but does not change the underlying transformation frequencies per competent cell at the other unlinked markers, then life is good.&lt;br /&gt;&lt;br /&gt;Here’s the data.  So for example, the first bar was calculated as f(kanR nalR) / f(nalR).  This normalizes each bar to its own rate (i.e. “the frequency of kanR among nalR transformants”).  For the nalR/competent cells, I used the average of all 12 points in the previous plot.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/S5_BIo2INnI/AAAAAAAAAhA/bZrZY-pxqE8/s1600-h/xcomp.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 192px; height: 320px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/S5_BIo2INnI/AAAAAAAAAhA/bZrZY-pxqE8/s320/xcomp.png" alt="" id="BLOGGER_PHOTO_ID_5449286428289152626" border="0" /&gt;&lt;/a&gt;This data closely resembles that of the first figure, except all the values are ~3.5 -fold higher.&lt;br /&gt;&lt;br /&gt;Woo!  Next I should probably repeat congression data for linked markers, and repeat experiments with more divergent donor DNA.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-2949541144758680147?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/2949541144758680147/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2010/03/chromosome-position-effect.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/2949541144758680147'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/2949541144758680147'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2010/03/chromosome-position-effect.html' title='Chromosome &quot;Position Effect&quot;'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_7qRGxl6StM4/S5_Al5LBCwI/AAAAAAAAAgo/NxyXEYkjBL8/s72-c/w_var_eye.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-5918379641134217839</id><published>2010-03-14T13:01:00.000-07:00</published><updated>2010-03-14T13:18:39.612-07:00</updated><title type='text'>Repression of competence induction by purines</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://gibk26.bse.kyutech.ac.jp/jouhou/image/dna-protein/hth/hth.html"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 307px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/S51Dx98OiXI/AAAAAAAAAgQ/218-dSFdXk4/s320/small_N1qqb.gif" alt="" id="BLOGGER_PHOTO_ID_5448585649907992946" border="0" /&gt;&lt;/a&gt;My illustrious colleagues have been re-examining some of the lab’s old data regarding the repression of competence by purines.  The work has been slowly ongoing for years, and it may be close to being a complete story.  I want to try and express what I think their model is and what seem to be its predictions, so they can tell me whether my understanding is straight…&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;First, a schematic depiction of how I interpret what we already know the induction of the competence regulon:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/S51EBFUD6wI/AAAAAAAAAgY/rW3dXOXHjAo/s1600-h/sxypath.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 282px; height: 320px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/S51EBFUD6wI/AAAAAAAAAgY/rW3dXOXHjAo/s320/sxypath.png" alt="" id="BLOGGER_PHOTO_ID_5448585909585046274" border="0" /&gt;&lt;/a&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-style: italic;"&gt;What we knew: &lt;/span&gt;&lt;/span&gt; Transferring cells growing in rich medium to competence medium induces 15 operons driven from the novel CRP-S promoter, and then cells become naturally transformable.  Cells in competence medium have elevated cyclic AMP levels, directing the CRP protein to induce expression of genes with canonical CRP-N promoters, including the &lt;span style="font-style: italic;"&gt;sxy&lt;/span&gt; gene.  Sxy protein alters the binding specificity of CRP to also bind at CRP-S promoters, thereby inducing competence gene expression.&lt;br /&gt;&lt;br /&gt;But Sxy levels are also regulated at translation, in addition to at transcription.  The wild-type &lt;span style="font-style: italic;"&gt;sxy&lt;/span&gt; mRNA transcript contains a stem-loop structure that inhibits its translation.  Mutations that disrupt the stem-loop structure in the 5’-UTR are hypercompetent (e.g. the &lt;span style="font-style: italic;"&gt;sxy-1&lt;/span&gt; mutation).  In wild-type cells, unknown factor(s) disrupt the stem-loop to induce the translation of &lt;span style="font-style: italic;"&gt;sxy&lt;/span&gt; transcript in competence medium.&lt;br /&gt;&lt;br /&gt;Now a schematic depiction of how I interpret the model for purine repression of competence:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/S51EGhFMhJI/AAAAAAAAAgg/fGg5GscdjDc/s1600-h/purpath.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 289px; height: 320px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/S51EGhFMhJI/AAAAAAAAAgg/fGg5GscdjDc/s320/purpath.png" alt="" id="BLOGGER_PHOTO_ID_5448586002938234002" border="0" /&gt;&lt;/a&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-style: italic;"&gt;The observation:&lt;/span&gt;&lt;/span&gt;  Addition of purines to competence medium represses competence.  Purine biosynthesis is repressed by the PurR protein when cellular pools of purines are high.  Deletion of the purR gene reduces competence (presumably indirectly, by increasing cellular pools of purine), but mutations disrupting the &lt;span style="font-style: italic;"&gt;sxy&lt;/span&gt; 5’UTR’s stem-loop suppress the &lt;span style="font-style: italic;"&gt;purR&lt;/span&gt; mutant defect (&lt;a href="http://rrresearch.blogspot.com/2010/03/overlooked-evidence-that-purine-pools.html"&gt;Rosie’s last post&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-style: italic;"&gt;A hypothesis:&lt;/span&gt;&lt;/span&gt;  The &lt;span style="font-style: italic;"&gt;sxy&lt;/span&gt; transcript stem-loop is stabilized in the presence of purines (either directly or indirectly), blocking the production of Sxy protein and thus the activation of the competence regulon.  When purine pools are depleted, the stem-loop is disrupted.  This predicts that addition of purines and &lt;span style="font-style: italic;"&gt;purR&lt;/span&gt; mutations will inhibit &lt;span style="font-style: italic;"&gt;sxy&lt;/span&gt; translation more than &lt;span style="font-style: italic;"&gt;sxy&lt;/span&gt; transcription.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;font-size:130%;" &gt;A corollary hypothesis:&lt;/span&gt;  Purines block DNA translocation by PurR-dependent repression of the &lt;span style="font-style: italic;"&gt;rec-2&lt;/span&gt; gene, whose promoter contains a putative PurR binding site.  A potential test of this hypothesis would be to treat sxy-1 competent cultures with purines.  We would predict that if PurR directly represses &lt;span style="font-style: italic;"&gt;rec-2&lt;/span&gt;, DNA translocation would be inhibited (but DNA uptake would not).  Obviously, checking &lt;span style="font-style: italic;"&gt;rec-2&lt;/span&gt; transcription relative to other competence genes would make sense here as well, but the functional test would be most compelling.&lt;br /&gt;&lt;br /&gt;Is that the basic notion?  I know there’s a bunch of other experiments that have been done that I need to find out about…&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-5918379641134217839?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/5918379641134217839/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2010/03/repression-of-competence-induction-by.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/5918379641134217839'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/5918379641134217839'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2010/03/repression-of-competence-induction-by.html' title='Repression of competence induction by purines'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_7qRGxl6StM4/S51Dx98OiXI/AAAAAAAAAgQ/218-dSFdXk4/s72-c/small_N1qqb.gif' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-4908881589329475052</id><published>2010-03-11T09:30:00.000-08:00</published><updated>2010-03-11T09:40:49.610-08:00</updated><title type='text'>What have I got?</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/S5krBR6XwZI/AAAAAAAAAfw/s5397CCCpDI/s1600-h/preps-norm-points.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 200px; height: 200px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/S5krBR6XwZI/AAAAAAAAAfw/s5397CCCpDI/s200/preps-norm-points.png" alt="" id="BLOGGER_PHOTO_ID_5447432525269418386" border="0" /&gt;&lt;/a&gt;Okay, grant planning part 2...  Below is a dense description of the preliminary data I have/will have for writing this next grant...&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;PRELIMINARY DATA&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Transformation frequency depends on chromosome position:  &lt;/span&gt;&lt;br /&gt;DNA from a multiply marked derivative of Rd (MAP7) was briefly incubated with competent Rd cultures.  The resulting transformation frequency at each of four loci was evaluated by selecting for cells that acquired the corresponding MAP7-specific antibiotic resistance allele.  MAP7 DNA transformed each Rd locus at a different frequency.  &lt;span style="font-weight: bold;"&gt;(Repeat experiment in progress… stay tuned but looks good.)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Sequence divergence decreases transformation frequency: &lt;/span&gt;&lt;br /&gt;DNA from an antibiotic-resistant derivative of NP (1350NN) transformed Rd competent cells less efficiently than did DNA from MAP7, and vice versa.  NP differs from Rd by ~2.4% per alignable base position (and an additional 10% of each genome is absent from the other, contained in indel polymorphisms) while the transformation frequencies at two loci were affected ~2 to 4-fold.  &lt;span style="font-weight: bold;"&gt;(Data in hand.)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Co-transformation frequencies are non-random due to congression and linkage:  &lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;(Wish I didn’t have to describe this, but it’s too fundamental.  Data mostly in hand.)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Transformants acquire hundreds of donor-specific alleles:  &lt;/span&gt;&lt;br /&gt;Several large DNA fragments recombined into the chromosomes of four individual Rd competent cells, as revealed by genome sequencing.  Each of the four transformants was selected for resistance to one of two antibiotics encoded in the 1350NN strain (two NalR and two NovR), and the corresponding donor-specific allele was present in each of the four. In all, 24 donor segments (contiguous stretches of donor-specific alleles) were found across the 4 transformants, with an average of 1.4% of each recipient chromosome replaced with donor DNA (~25 kb and ~600 SNPs each).  Mismatch repair is likely responsible for the disruption of contiguous stretches of donor-specific alleles in the transformants; assuming that for closely adjoined segments this was true, a total of 10 (instead of 24) independent transformation events occurred across the four transformants (6 of which were unselected; notably two of these were overlapping in independent transformants). &lt;span style="font-weight: bold;"&gt;(Data in hand; re-analysis in progress.)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;DNA uptake signal sequences (USS) are densely distributed in the two genomes:  &lt;/span&gt;&lt;br /&gt;Both the Rd and NP chromosomes contain USSs nearly every kilobase and most are syntenic.  &lt;span style="font-weight: bold;"&gt;(Cursory data only.  Need a better analysis.)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;&lt;br /&gt;Sequence preferences in DNA uptake can be captured by periplasmic DNA purification:  &lt;/span&gt;&lt;br /&gt;DNA fragments containing uptake signal sequences are efficiently taken up into cells, and taken up fragments can be cleanly purified away from both free DNA and chromosomal DNA.  The use of rec-2 and rec-1 mutations will facilitate separating sequence biases at different stages of natural transformation. &lt;span style="font-weight: bold;"&gt;(Data in hand, except rec-1.)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;LIST OF FIGURES:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Four/five marker transformation rates&lt;/li&gt;&lt;li&gt;Rd vs NP transformation rates&lt;/li&gt;&lt;li&gt;SNP spacing histogram with embedded SV table&lt;/li&gt;&lt;li&gt;Genome sequencing figure (pool data)&lt;/li&gt;&lt;li&gt;USS analysis&lt;/li&gt;&lt;li&gt;Molecular biology figure (uptake data)&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-4908881589329475052?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/4908881589329475052/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2010/03/what-have-i-got.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/4908881589329475052'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/4908881589329475052'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2010/03/what-have-i-got.html' title='What have I got?'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_7qRGxl6StM4/S5krBR6XwZI/AAAAAAAAAfw/s5397CCCpDI/s72-c/preps-norm-points.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-7739883568236263772</id><published>2010-03-09T09:28:00.000-08:00</published><updated>2010-03-09T09:49:08.069-08:00</updated><title type='text'>Another day, another attempt to get a dollar</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://faculty.matcmadison.edu/mljensen/111CourseDocs/111Review/Unit2Reviews/haemophilus_answers.htm"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 261px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/S5aJp0AbGUI/AAAAAAAAAfg/H_VX6AMJJWU/s320/GS-direct.jpg" alt="" id="BLOGGER_PHOTO_ID_5446692150779255106" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Sigh...  another grant due soon; this time, it's my last attempt to get an NIH postdoctoral fellowship.  My last reviews mainly took issue with my proposal, which they found to be overly ambitious and somewhat unfocused.  So below, is my first attempt at a summary/specific aims page, followed by a couple of preliminary data collection things I'd like to do before it's due (on April 8)...&lt;br /&gt;&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;The introduction:&lt;/span&gt; Naturally competent bacteria take up intact DNA from their surroundings and can incorporate it into their chromosomes by homologous recombination.  Akin to sexual recombination in eukaryotes, this natural transformation pathway moves alleles and genes between otherwise clonal lineages; and human bacterial pathogens have used this pathway to share antibiotic resistance genes, antigenic determinants, and virulence factors.  To better elucidate the mechanism of transformation and to inform population/epidemiological studies, the proposed work will use the opportunistic Gram-negative bacterium &lt;span style="font-style: italic;"&gt;Haemophilus influenzae&lt;/span&gt; to disentangle the sequence biases intrinsic to the DNA uptake and DNA recombination mechanisms by combining classical microbiology with modern DNA sequencing.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;The specific aims:&lt;/span&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Define the genetic consequences of natural competence to &lt;span style="font-style: italic;"&gt;H. influenzae&lt;/span&gt;.  &lt;/span&gt;Transformation frequencies vary for different sequences and at different  chromosomal locations, and this could strongly influence the rate of  sequence evolution and adaptation along the genome.  I will transform  competent cultures of the standard lab strain with the genomic DNA of a  clinical isolate and use deep sequencing to measure transformation  across the lab strain’s chromosome for all the ~40,000 sites differing  in the clinical isolate.  This will provide an unparalleled dataset for  investigating the sequence factors that promote and limit genetic  exchange between bacterial cells.&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Measure the contribution of DNA uptake specificity to natural transformation.&lt;/span&gt;  In several human pathogens, including &lt;span style="font-style: italic;"&gt;H. influenzae&lt;/span&gt;, the uptake machinery prefers DNA fragments containing short “uptake sequences”, and abundant sequence motifs in many bacterial chromosomes suggest that biased DNA uptake has had a profound influence on genome evolution.  I will purify the intact DNA molecules taken up into the periplasm and cytosol of competent cultures and use deep sequencing to measure the sequence biases of the uptake machinery.  In combination with (a), this will disentangle the contributions of DNA uptake from those of DNA recombination during natural transformation.&lt;/li&gt;&lt;/ol&gt;  &lt;span style="font-weight: bold;"&gt;The platitudes:&lt;/span&gt;  The proposed work will link molecular studies of transformation to the growing genome sequence data being collected from many isolates of many bacterial species.  By establishing my approach with completely sequenced chromosomes and using a well-defined experimental system, later studies could include a greater diversity of sequences or mimic more and more natural conditions.  As a directly applicable outcome, the work will also produce the beginnings of a new type of genetic resource for mapping traits that differ between natural bacterial isolates (as in eukaryotic quantitative genetics) by generating fully genotyped recombinants.  In the future, such studies will give empirical underpinnings to population genomic studies of bacterial genetic exchange, as well as provide new testable hypotheses for investigating the molecular mechanism of transformation.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Preliminary data I would like:&lt;/span&gt; (besides what I’ve got)&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Properly replicated transformation frequencies for several markers.  (I did this before, but it hasn’t been properly replicated.)&lt;/li&gt;&lt;li&gt;Follow a few molecules through uptake and recombination?&lt;/li&gt;&lt;li&gt;Population genetic inferences of “recombination” in H. influenzae? (I did this before, but it sucked.)&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-7739883568236263772?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/7739883568236263772/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2010/03/another-day-another-attempt-to-get.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/7739883568236263772'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/7739883568236263772'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2010/03/another-day-another-attempt-to-get.html' title='Another day, another attempt to get a dollar'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_7qRGxl6StM4/S5aJp0AbGUI/AAAAAAAAAfg/H_VX6AMJJWU/s72-c/GS-direct.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-4626688331780518841</id><published>2010-03-05T16:01:00.001-08:00</published><updated>2010-03-05T17:13:10.410-08:00</updated><title type='text'>Multiplexing sans barcodes</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.darkroastedblend.com/2008/04/japanese-creative-barcodes.html"&gt;&lt;img style="float: left; margin: 0pt 10px 10px 0pt; cursor: pointer; width: 320px; height: 222px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/S5Gbi6ajMfI/AAAAAAAAAeg/yW6xCN2c5vM/s320/creative-barcodes.jpg" alt="" id="BLOGGER_PHOTO_ID_5445304448566112754" border="0" /&gt;&lt;/a&gt;Previously, I’d said I wanted to go over how we might obtain many recombinant genotypes by deep sequencing pools of recombinants, since our tiny genome is TOO EASILY SEQUENCED using modern methods, making the sequencing of individual clones inefficient.  The challenge is then in assigning donor DNA segments in the pools to individual clones.  To a first approximation, this isn’t really necessary, since one of our main motivations for sequencing recombinants is simply to determine whether the locations and endpoints of donor DNA segments are biased: i.e. whether there are recombination hotspots or whether certain types of donor-recipient differences are recalcitrant to recombination.&lt;br /&gt;&lt;br /&gt;However, our preliminary data showed that donor segments were often clustered in individual recombinants, probably due to mismatch repair disrupting larger donor fragments during transformation.  We were only able to pin this down, because we individually sequenced 2 of the 4 transformants that we’d pooled.  To illustrate, here’s a zoom of the region containing one of our selected sites at gyrB.  2 of 4 clones carry the causal allele (the red dot).  But the pool data indicates several additional segments:&lt;br /&gt;&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;Are they in different clones?  The same clone?  How do we disentangle, without sequencing individuals (as was done here; shown as sets of colored bars at the top)?&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/S5GcTVt4_HI/AAAAAAAAAeo/v7DJU9FoXGE/s1600-h/zoomer.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 161px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/S5GcTVt4_HI/AAAAAAAAAeo/v7DJU9FoXGE/s320/zoomer.png" alt="" id="BLOGGER_PHOTO_ID_5445305280528710770" border="0" /&gt;&lt;/a&gt;Several methods for handling pooled data exist.  The one typically referred to is “barcoding” where samples are processed individually and have unique sequence codes added during library construction, so that individual sequence reads can be assigned to individual clones.  This is powerful method, but extremely expensive and labor-intensive.  It surely has useful contexts, but for our purposes, we don’t really need to assign every read to every clone… only donor segments.&lt;br /&gt;&lt;br /&gt;An alternate approach, outlined below, would simply ensure that any given clone appears in two different otherwise non-overlapping pools.  In its simplest form this would simply be to pool by rows and also by columns (other more involved ways are &lt;a href="http://genome.cshlp.org/content/19/7/1243"&gt;here&lt;/a&gt; and &lt;a href="http://genome.cshlp.org/content/19/7/1254.short"&gt;here&lt;/a&gt;).  I recently did a transformation experiment, where afterwards I grew up independent transformants in 64 wells of a 96-well culture plate.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/S5Go146bVrI/AAAAAAAAAew/yREfSL2YGHE/s1600-h/grid.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 224px; height: 320px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/S5Go146bVrI/AAAAAAAAAew/yREfSL2YGHE/s320/grid.png" alt="" id="BLOGGER_PHOTO_ID_5445319068231620274" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;They were arrayed in a checkerboard grid… 8X8 clones (yellow = NalR, and blue=NovR).  If I prep DNA from all these clones, I could then produce Row Pools 1-8 and Column Pools A-H and each would have four clones of each resistant type.  One issue would be distinguishing which endpoints belong together when segments are overlapping; another issue would be deciding which segments belong in the same clone.&lt;br /&gt;&lt;br /&gt;If a donor segment appeared in clone 3C, for example, and it had unique endpoints (i.e. that donor segment is present only in clone 3C), then we would see those unique endpoints solely in pool 3 and pool C.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/S5Gpqy0HZdI/AAAAAAAAAe4/0zts6zXpX3I/s1600-h/uniq.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 275px; height: 320px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/S5Gpqy0HZdI/AAAAAAAAAe4/0zts6zXpX3I/s320/uniq.png" alt="" id="BLOGGER_PHOTO_ID_5445319977127601618" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;So we would have no difficulty assigning the segment to clone 3C.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/S5Gp6I9r8wI/AAAAAAAAAfA/PR7kEAzS5oo/s1600-h/uniqgrid.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 201px; height: 211px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/S5Gp6I9r8wI/AAAAAAAAAfA/PR7kEAzS5oo/s320/uniqgrid.png" alt="" id="BLOGGER_PHOTO_ID_5445320240771363586" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;On the other hand, if the segment was NOT unique, but present in, say clones 3C and 7E, we’d be unable to assign the segment to a particular clone due to "ghost" signals, but would instead know that there were two identical segments, but either in 3C and 7E, or in 3E and 7C.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/S5GqegPrbAI/AAAAAAAAAfI/6ocGXpWg5kU/s1600-h/share.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 192px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/S5GqegPrbAI/AAAAAAAAAfI/6ocGXpWg5kU/s320/share.png" alt="" id="BLOGGER_PHOTO_ID_5445320865496132610" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/S5Gqidj0ZtI/AAAAAAAAAfQ/KEN2QHjRVE4/s1600-h/sharegrid.png"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 233px; height: 246px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/S5Gqidj0ZtI/AAAAAAAAAfQ/KEN2QHjRVE4/s320/sharegrid.png" alt="" id="BLOGGER_PHOTO_ID_5445320933494777554" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;(We’d be able to do this, since we’d still know the frequency of the segment in the different pools.)&lt;br /&gt;&lt;br /&gt;So this is a good plan.  We could first sequence by rows, giving us 64 more clones worth of data.  And as long as there aren’t a whole bunch of identical endpoints for independent donor segments, we could then sequence pooled columns to assign segments to clones.  If there were tons of identical endpoints, this would be such a shocking result, we’d need to re-think our next step anyways…&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-4626688331780518841?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/4626688331780518841/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2010/03/multiplexing-sans-barcodes.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/4626688331780518841'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/4626688331780518841'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2010/03/multiplexing-sans-barcodes.html' title='Multiplexing sans barcodes'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_7qRGxl6StM4/S5Gbi6ajMfI/AAAAAAAAAeg/yW6xCN2c5vM/s72-c/creative-barcodes.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-6492108599431698457</id><published>2010-02-05T12:16:00.000-08:00</published><updated>2010-02-05T12:32:31.693-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='recombination'/><category scheme='http://www.blogger.com/atom/ns#' term='sequencing'/><title type='text'>grantgrantgrant</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.colorado.edu/MCDB/MCDB1111/tutorials/nucleicAcid_1.htm"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 247px; height: 320px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/S2x9RyciqxI/AAAAAAAAAdg/tkwOGGdiGko/s320/bacterial+chromosome.jpg" alt="" id="BLOGGER_PHOTO_ID_5434856594882079506" border="0" /&gt;&lt;/a&gt;Whew!  So another grant out-of-the way for now; another one almost done; and my own postdoc re-application in the works…  A brief respite…  Maybe it’s time to blog.&lt;br /&gt;&lt;br /&gt;Last time (two months ago), I started showing some pictures from IGV showing our raw sequence data aligned to the Rd genome.  Here, I’ll do yet another summary of our preliminary experiment, as we pitched it in our grant application, which will lead nicely into what I’d like to do next time… talk about alternatives to multiplexing DNA samples by barcoding…&lt;br /&gt;&lt;br /&gt;So we’re studying transformational recombination in H. influenzae, where cells take up DNA from the media and incorporate it into their chromosomes.  We think we have a decent model of the mechanism from studies in H. influenzae and other organisms:&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;(My figures here might have been a little degraded on their journey into the blog)&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/S2x_IxxIjYI/AAAAAAAAAdw/4gXqlpWWBHA/s1600-h/Slide4.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 267px; height: 315px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/S2x_IxxIjYI/AAAAAAAAAdw/4gXqlpWWBHA/s320/Slide4.png" alt="" id="BLOGGER_PHOTO_ID_5434858639104445826" border="0" /&gt;&lt;/a&gt;But until our little sequencing experiment, we could only infer the extent of transformational recombination of a chromosome based on the transformation and co-transformation frequencies of phenotypic markers.  We’ve learned a lot just from obtaining four recombinant genotypes.  Here’s what the experiment looked like in overview:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/S2x_eSNw6lI/AAAAAAAAAd4/FFc9k22Iz3w/s1600-h/Slide1.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 113px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/S2x_eSNw6lI/AAAAAAAAAd4/FFc9k22Iz3w/s320/Slide1.png" alt="" id="BLOGGER_PHOTO_ID_5434859008591718994" border="0" /&gt;&lt;/a&gt;DNA from one isolate (NP NovR NalR) was incubated with competent cells of another (Rd), and transformants were selected.  Two were NovR and two were NalR.  We got sequence data from our collaborator for all four of these in a pool, two of them individually, and each parent (Rd and NP) individually.  Here’s the figure we used to illustrate what our data looked like:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/S2x_pZiu31I/AAAAAAAAAeA/ToGrFV6tD-U/s1600-h/Slide5.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 204px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/S2x_pZiu31I/AAAAAAAAAeA/ToGrFV6tD-U/s400/Slide5.png" alt="" id="BLOGGER_PHOTO_ID_5434859199537274706" border="0" /&gt;&lt;/a&gt;Hmm…  probably that’s a low-resolution picture, but working from the bottom of the figure:&lt;br /&gt;The lower panel shows the frequency of NP-specific SNP alleles across the Rd chromosome for the pool of four chromosomes.  Blue dots at 25% indicate that 1 of 4 recombinants contained the donor-specific allele, while blue dots at 50% indicate that 2 of 4 recombinants did.  The two red dots indicate the two selected markers (NovR and NalR), which as expected are at 50%.&lt;br /&gt;&lt;br /&gt;In the upper panel, a zoomed view around the NovR-containing region is shown.  The blue dots clearly define the donor DNA segments, but since there are overlapping donor segments, their appropriate assignment to different recombinants is unclear:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/S2x_z8wKLZI/AAAAAAAAAeI/yiMqpwj0Vhw/s1600-h/Slide6.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 128px; height: 86px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/S2x_z8wKLZI/AAAAAAAAAeI/yiMqpwj0Vhw/s320/Slide6.png" alt="" id="BLOGGER_PHOTO_ID_5434859380787522962" border="0" /&gt;&lt;/a&gt;But because we also sequenced one of the NovR recombinants, the assignment of all the segments is made apparent.  The green bars at the top of the figure show the donor DNA segments in Recombinant A, and so the donor segment spanning NovR in Recombinant B is unambiguously inferred.&lt;br /&gt;&lt;br /&gt;Notably, there are several clustered donor segments in Recombinant A.  This suggests that processes like mismatch repair may be disrupting larger original DNA fragments during recombination.  For example in the upper panel of Figure 3 above, the area shown by the small purple circle appears to be a mismatch repair event around an insertional deletion difference between Rd and NP.   Here is what that region looks like in IGV:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/S2yAIBrqAzI/AAAAAAAAAeY/7nOLLJSNT4Y/s1600-h/vsNP-650k-svrepair-long.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 220px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/S2yAIBrqAzI/AAAAAAAAAeY/7nOLLJSNT4Y/s400/vsNP-650k-svrepair-long.png" alt="" id="BLOGGER_PHOTO_ID_5434859725708198706" border="0" /&gt;&lt;/a&gt;This IGV picture is showing our sequencing reads against the NP genome (the donor).  The top track shows our Rd reads mapped to NP; the middle track show NP reads mapped to NP, and the bottom shows Recombinant A reads onto NP.  I looked at the whole-genome alignment in this interval and found that the structural variation here is due to an insertional deletion: the alignment breaks and NP has 128 bp that doesn’t align with 52 bp of Rd.&lt;br /&gt;&lt;br /&gt;Here is how I interpreted this event in the context of the larger NP donor segment:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/S2x_7HkGBII/AAAAAAAAAeQ/7wJMhBQqEEs/s1600-h/Slide7.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 197px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/S2x_7HkGBII/AAAAAAAAAeQ/7wJMhBQqEEs/s320/Slide7.png" alt="" id="BLOGGER_PHOTO_ID_5434859503948792962" border="0" /&gt;&lt;/a&gt;Okay!  Cool!&lt;br /&gt;&lt;br /&gt;It’s going to take a while to fully parse this data, but more important is how we should go about collecting more.  We certainly think we can increase our pool size, but as it is now, we can’t obtain “linkage” information from the pool.  The obvious solution, barcoding individual DNA samples, presents monetary, technical, and computational problems.  However, there may be another way…&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-6492108599431698457?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/6492108599431698457/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2010/02/grantgrantgrant.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/6492108599431698457'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/6492108599431698457'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2010/02/grantgrantgrant.html' title='grantgrantgrant'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_7qRGxl6StM4/S2x9RyciqxI/AAAAAAAAAdg/tkwOGGdiGko/s72-c/bacterial+chromosome.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-2132984580517738153</id><published>2010-01-07T13:51:00.000-08:00</published><updated>2010-01-07T15:12:45.194-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sequencing'/><title type='text'>Using IGV to look at Illumina data</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bioinformatics.oxfordjournals.org/cgi/content/full/25/14/1754"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 182px; height: 200px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/S0Zo8neCyJI/AAAAAAAAAdY/gI7kKtWlkCU/s200/btp324f1.jpg" alt="" id="BLOGGER_PHOTO_ID_5424138191810775186" border="0" /&gt;&lt;/a&gt;Out with the oughts in with the tens...  Out with &lt;a href="http://maq.sourceforge.net/"&gt;Maq&lt;/a&gt;, in with &lt;a href="http://bio-bwa.sourceforge.net/"&gt;BWA&lt;/a&gt;!&lt;br /&gt;&lt;br /&gt;Recap:  &lt;a href="http://nodnacontrol.blogspot.com/2009/11/recombinant-genomes-first-pass.html"&gt;I got a bunch of Illumina GA2 data from 5 DNA samples&lt;/a&gt;:  Our recipient chromosome, Rd; our donor chromosome, NP; two transformed chromosomes; and a pool of four transformed chromosomes.  I’d previously done a bunch of work with this data using the Maq alignment algorithm and SNP caller and hacked away at the SNPs in R.&lt;br /&gt;&lt;br /&gt;Since then, I re-mapped our sequence reads using the BWA alignment algorithm, which was pretty much as easily installed and used (from the command line) as was Maq, though there are far more settings that might be manipulated.  BWA has two big advantages over Maq:&lt;br /&gt;&lt;br /&gt;1) It performs gapped alignment, so reads containing short indels can still be mapped to a reference sequence.  This partially helped to overcome variation in read depth (coverage) due to mapping artifacts.&lt;br /&gt;2) It outputs data in the SAM format, which is a newly-minted standard format for reference mappings of deep sequencing data.  Thus I can use several downstream analysis tools that have been developed to work with SAM files and their binary equivalent BAM.  Namely, the &lt;a href="http://samtools.sourceforge.net/"&gt;SAMtools&lt;/a&gt; package and the Broad Institute’s &lt;a href="http://www.broadinstitute.org/igv/"&gt;Integrated Genome Viewer (IGV)&lt;/a&gt; work with this format.&lt;br /&gt;&lt;br /&gt;It still has one similar disadvantage as Maq, in that when a read maps to multiple locations in the genome, only one arbitrary mapping is kept, while all alternate mappings are discarded.  This can create some odd artifacts at multicopy sequences in the genome.  Largely this isn’t such a problem for me, but does cause some challenges in interpreting the correct base at every position.&lt;br /&gt;&lt;br /&gt;To make a long story short, as expected, this re-analysis ended up yielding the same recombinant donor segments in our transformed chromosomes at a gross level, but I was also able to examine the genomes we sequenced in much greater detail by manually scanning the raw data in IGV.&lt;br /&gt;&lt;br /&gt;To make the short story long...&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;I had an odd problem running IGV initially:  To conserve memory, IGV only loads in a portion of the genome at a time.  For large genomes with low coverage data, it works great with little RAM, so launching the application directly from the website is no problem.  But because our dataset has such tremendous coverage of the genome, IGV was continuously crashing when I tried to run the 2Gb web version, and the lab computer only has 6Gb, so I couldn’t run the 10Gb web version.&lt;br /&gt;&lt;br /&gt;So instead, I had to download the package and run it myself from the command-line.  First I modified the relevant shell script for my computer called igv_mac-intel.sh by changing the switch -Xmx750m to -Xmx4800m using &lt;a href="http://www.nano-editor.org/"&gt;nano&lt;/a&gt;.  This changed the memory used by IGV from 750 Mb to 4800 Mb.  Then, after making the script executable (using chmod +x), I could run the script from the IGV directory by typing:&lt;br /&gt;./igv_mac-intel.sh.&lt;br /&gt;&lt;br /&gt;I did confront one other issue, which was that my FastA header was not identical to the name of the chromosome used in the SAM format file, so before loading the genome, I had to modify the FastA header from Genbank’s version to what I had used in naming the chromosome in the SAM file.&lt;br /&gt;&lt;br /&gt;From there it was simple to load in a genome and then load in BAM files to look at the BWA alignments.  (While this worked like a charm, I completely failed to add annotation tracks; IGV can apparently load GFF3 and BED formatted files, but mine wouldn’t load properly for whatever reason).&lt;br /&gt;&lt;br /&gt;IGV’s visualization of read alignments really helped me understand the nature of this huge data set.  I have now scanned through all the datasets against the Rd sequence reference and am part way through scanning through them all using the NP sequence reference.  It’s been a rather cumbersome and slow chore, which I’ve spent the last three days doing, but it is also rewarding to look at the raw data, rather than something that I just have the computer spit out.  I’ll get to that next.&lt;br /&gt;&lt;br /&gt;I will use the rest of this post to go over some basic things about looking at this kind of data and will later show some interesting bits of our transformants.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;Rd versus Rd&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Here is what mapping our Rd sequence reads against the Rd reference genome looks like for a tiny segment of the genome.  I picked a location that seems to have an “error-prone” region.  There’s a whole slew of locations that look like this.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/S0ZoUq-Z3dI/AAAAAAAAAc4/LWqSeJbsnfk/s1600-h/expand.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 281px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/S0ZoUq-Z3dI/AAAAAAAAAc4/LWqSeJbsnfk/s400/expand.png" alt="" id="BLOGGER_PHOTO_ID_5424137505557044690" border="0" /&gt;&lt;/a&gt;The top panel shows the portion of the Rd chromosome in view.&lt;br /&gt;&lt;br /&gt;The next panel shows coverage, or “read depth” per position (so for this window, coverage ranges from ~800-1000 sequence reads mapped per position).&lt;br /&gt;&lt;br /&gt;The three colored columns in the read depth panel are indicating potential sequence variants, based on a user-defined threshold percentage (I am using the early access version of IGV to have this control.  I set it at 10%.)    Just below the coverage plot is a running string of colored boxes, which indicate the bases in the Rd reference sequence.&lt;br /&gt;&lt;br /&gt;So at the central base (position 904,862), the reference has a C (blue), whereas in our dataset, this position was covered 770 times, where 78% of reads were C, while ~21% were T (along with a few other stragglers).&lt;br /&gt;&lt;br /&gt;In the lower panel is a pileup of the individual sequence reads.  Grey indicates a base matching the reference, whereas colors indicate mismatches.  As Ilumina sequencing is rather error-prone, as scattering of color is seen all over the place, but for most positions, the vast majority of reads match the reference base.  The orientation of the read is also obvious in this view.&lt;br /&gt;&lt;br /&gt;However, this is only showing a handful of the reads across this region.  I can also compress the reads by right-clicking and selecting the “Collapse Track” option:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/S0ZoatN9dlI/AAAAAAAAAdA/ZFXD_ynVjCc/s1600-h/collapse.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 281px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/S0ZoatN9dlI/AAAAAAAAAdA/ZFXD_ynVjCc/s400/collapse.png" alt="" id="BLOGGER_PHOTO_ID_5424137609238378066" border="0" /&gt;&lt;/a&gt;Now, the high degree of “errors” for several bases becomes evident.  Other features also become apparent, namely the red and black labeled reads:&lt;br /&gt;&lt;br /&gt;Red reads have an “orphaned” pair; i.e. the other read from the same molecule wasn’t mapped.  This could be because the mate pair was a low-quality read, or could be because the mate pair was in sequence absent from the Rd genome.&lt;br /&gt;&lt;br /&gt;Black reads have a paired read outside the user-defined maximum DNA fragment size (which I set at 300 bp).  That is, the mate pair maps more than 300 bases away.  As with most of the sequencing errors outside of the error-prone region, the red and black lines are fairly rare and evenly scattered.  Manually checking where the black mate pairs were showed that most were just outside my threshold, so probably indicate no indel or rearrangement.&lt;br /&gt;&lt;br /&gt;So what’s going on with the “error-prone” region?  Because there’s not a lot of strangely mapping paired reads, I doubt this is a BWA artifact, but more likely is a sequencing artifact, possibly due to a bit of DNA that is “slippery” to the polymerase.&lt;br /&gt;&lt;br /&gt;The interesting possibility that these are due to “clonal variation” (especially since the three variants shown in the coverage column are all ~25%) is unlikely in this instance, since there seem to be error-prone regions immediately adjacent to these three that just didn’t make my threshold.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;NP versus Rd&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The next image adds the NP sequencing data to the mix, still using Rd as the reference.  Again, the error-prone bit shows up, further arguing that this isn’t due to clonal variation, since this is an independent culture and strain.  However, on the left, several bona fide NP-specific SNPs were quite unambiguously identified.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/S0Zoe9zmPpI/AAAAAAAAAdI/kC9bd28LD1U/s1600-h/nptoo.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 270px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/S0Zoe9zmPpI/AAAAAAAAAdI/kC9bd28LD1U/s400/nptoo.png" alt="" id="BLOGGER_PHOTO_ID_5424137682410684050" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;The next image illustrates a couple things:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/S0ZokZSNx4I/AAAAAAAAAdQ/pZrV5Bmjrts/s1600-h/sv.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 270px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/S0ZokZSNx4I/AAAAAAAAAdQ/pZrV5Bmjrts/s400/sv.png" alt="" id="BLOGGER_PHOTO_ID_5424137775686207362" border="0" /&gt;&lt;/a&gt;First, the region I previously showed had rather even coverage, whereas many, if not most, regions show much more variation in coverage.  Here, coverage varies from ~200 to ~1800.  In most instances, all DNA samples showed similar coverage for the same region, unless there is a structural variant.&lt;br /&gt;&lt;br /&gt;Second, there is clearly a structural variant of some kind here between Rd and NP. The black discordant reads mostly map ~375 bp away, suggesting an ~75 bp insertion in Rd relative to NP.  The red orphaned reads (mostly flanking the black reads) suggest that there is also an insertion in NP relative to Rd... so an insertional deletion then.&lt;br /&gt;&lt;br /&gt;Okay, that’s it for now.  Later, I’ll put up some more interesting stuff...&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-2132984580517738153?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/2132984580517738153/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2010/01/using-igv-to-look-at-illumina-data.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/2132984580517738153'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/2132984580517738153'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2010/01/using-igv-to-look-at-illumina-data.html' title='Using IGV to look at Illumina data'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_7qRGxl6StM4/S0Zo8neCyJI/AAAAAAAAAdY/gI7kKtWlkCU/s72-c/btp324f1.jpg' height='72' width='72'/><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-958948180877017263</id><published>2009-12-28T11:05:00.001-08:00</published><updated>2009-12-28T11:11:07.455-08:00</updated><title type='text'>Vacation Post: Uncoiled!</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://species.asu.edu/2009_species06"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 307px; height: 320px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/SzkBmNepIBI/AAAAAAAAAcw/yWuK9kXUyFQ/s320/Opisthostoma+vermiculum.jpg" alt="" id="BLOGGER_PHOTO_ID_5420365382481944594" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Number 6...&lt;br /&gt;&lt;span class="fullpost"&gt;&lt;br /&gt;...in the &lt;a href="http://www.species.asu.edu/Top10"&gt;"Top 10 New Species of 2009"&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://species.asu.edu/2009_species10"&gt;Number 10&lt;/a&gt; is pretty mind-boggling...&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-958948180877017263?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/958948180877017263/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/12/vacation-post-uncoiled.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/958948180877017263'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/958948180877017263'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/12/vacation-post-uncoiled.html' title='Vacation Post: Uncoiled!'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_7qRGxl6StM4/SzkBmNepIBI/AAAAAAAAAcw/yWuK9kXUyFQ/s72-c/Opisthostoma+vermiculum.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-7382406282176761550</id><published>2009-12-11T17:58:00.000-08:00</published><updated>2009-12-11T18:50:19.144-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sequencing'/><title type='text'>Update on problems with analysis</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.marcjohns.com/"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 258px; height: 320px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/SyL4-DLoVSI/AAAAAAAAAck/OVx5S41vy9A/s320/tumblr_kubsktG6Fq1qz6f9yo1_500.jpg" alt="" id="BLOGGER_PHOTO_ID_5414163446942422306" border="0" /&gt;&lt;/a&gt;Ugh.  Too much to blog about.  The pace of computer work is totally different than with lab work (though in the end they seem equally labor-intensive), and I've made so many figures and gotten so many numbers in the past week, I barely know where to start...&lt;br /&gt;&lt;br /&gt;Well, I guess I'll just blather about a couple problems I've been working through, but leave out all the charts and graphs and figures for the time being:&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;500-fold sequence coverage still "misses" parts of the genome:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We got a ton of data.  For our controls, it was surely massive overkill.  Nevertheless, "read depth" (how many read mappings cover a particular position) still varies by a substantial amount.  There are numerous sources of this variation (%GC being one that is quite apparent), but I am most worried about variation in "read depth" due to the alignment algorithm I'm using to map reads to the genome.&lt;br /&gt;&lt;br /&gt;As I try to root out artifacts in my discovery of &lt;a href="http://nodnacontrol.blogspot.com/2009/12/mutagenic-recombination.html"&gt;putative recombination-associated mutations&lt;/a&gt;, I confront the fact that "read depth" is on average reduced when recombinant donor segments are mapped back to the recipient genome, so the novel mutations I found in these strains are on average supported by far fewer sequence reads than the average base...  Most of them still look pretty solid to me (though there are several obvious artifacts), but I don't have a good rationale for whether or not to trust them.&lt;br /&gt;&lt;br /&gt;I'm trying several things to work this out, namely by examinining the reciprocal mappings (using the donor chromosome as the reference).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;So far, my analysis has a huge risk of false negatives:  &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Several problems here.&lt;br /&gt;&lt;br /&gt;(a) Part of this problem and the last one is that I am using an alignment package that does not account for gaps (&lt;a href="http://maq.sourceforge.net/"&gt;Maq&lt;/a&gt;).  This means even a single nucleotide indel reduces "read depth" dramatically on either side (out to ~42 bases, or read length).    See above.&lt;br /&gt;&lt;br /&gt;(b) Another issue I'm facing with several of Maq's downstream outputs is that "read depth" is capped at 255.  Presumably, they were conserving memory and only assigned a byte to this number.  But what I haven't quite figured out is whether the SNP output (for example) is ignoring any possible SNPs where coverage exceeded 255.  My cursory look at the more raw output (the "pileup") suggests this might well be the case.  This could mean that I'm missing a lot, since the mean "read depth" per genome position in our datasets is ~500.&lt;br /&gt;&lt;br /&gt;(c) Finally, I've been ignoring all Maq's self-SNP and "heterozygous" SNP calls in my downstream analysis using R.  I presume that SNPs called in my mapping of the recipient genome to the complete recipient sequence are simply mutations between our wild-type Rd strain and the sequenced one.  (As an aside, several hundred of the SNPs called by Maq were actually giving the correct base for an ambiguous base in the "complete" genome.  I'd like to find a way to somehow revise the archived Rd sequence to get rid of all the ambiguous bases.)  And I don't have a solid plan on how to deal with the "heterozygous" calls.  Because the Maq assembly program can only have greater than or equal to two haplotypes, positions with mixed base signals are called heterozygotes.  These is actually pretty cool and could reflect cool stuff like clonal variation, but largely these are probably due to multiply mapping reads and/or persistent sequencing errors.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Solutions:&lt;/span&gt;  The solutions to these problems will initially largely be a matter of doing everything all again with different alignment software.  My plan is to use the &lt;a href="http://maq.sourceforge.net/"&gt;BWA&lt;/a&gt; aligner and &lt;a href="http://samtools.sourceforge.net/"&gt;SAMtools&lt;/a&gt;.  BWA allows for gaps (so the "read depth" issue should be partially solved), and SAMtools not only keeps everything in a new agreed-upon standard format, but has several other tools, including what looks to be a better SNP caller (at least it has more modifiable settings).  I would also like to try to do some &lt;span style="font-style: italic;"&gt;de novo&lt;/span&gt; assembly, perhaps with &lt;a href="http://www.ebi.ac.uk/%7Ezerbino/velvet/"&gt;Velvet&lt;/a&gt;, since we have such absurd coverage and a simple enough genome.&lt;br /&gt;&lt;br /&gt;In the meantime, my R-fu has been improving, though I am convinced that I am missing some really basic principles that would make my code run a lot faster.&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-7382406282176761550?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/7382406282176761550/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/12/ugh.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/7382406282176761550'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/7382406282176761550'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/12/ugh.html' title='Update on problems with analysis'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_7qRGxl6StM4/SyL4-DLoVSI/AAAAAAAAAck/OVx5S41vy9A/s72-c/tumblr_kubsktG6Fq1qz6f9yo1_500.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-3868189498708514812</id><published>2009-12-02T13:44:00.000-08:00</published><updated>2009-12-02T14:07:34.032-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mutation'/><category scheme='http://www.blogger.com/atom/ns#' term='recombination'/><category scheme='http://www.blogger.com/atom/ns#' term='sequencing'/><title type='text'>Mutagenic recombination?</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.cbs.dtu.dk/staff/dave/roanoke/genetics980415f.htm"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 236px; height: 320px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/Sxbg9LKx6aI/AAAAAAAAAcM/os3zF3sAzzg/s320/fig20_00.jpg" alt="" id="BLOGGER_PHOTO_ID_5410759343906875810" border="0" /&gt;&lt;/a&gt;Okay, this is pretty cool.  I will probably discover that it’s just some artifact of the mapping, but digging into the transformant data some more reveals what appears to be a high number of mutations  within recombined segments (alleles that have neither donor nor recipient identity)...&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;For this analysis, I used much more stringent criteria for calling SNPs, so that low quality SNP calls would not contaminate the result.  This is particularly important here, since we might expect to get lower quality SNP calls in the recombinant segments, due to the relatively high divergence between the donor and recipient genomes and the limitations of current mapping algorithms.&lt;br /&gt;&lt;br /&gt;For TfA, there were 802 unambiguous donor alleles and 19 high-quality novel alleles, while for TfB, there were 902 unambiguous donor alleles and 21 high-quality novel alleles.&lt;br /&gt;&lt;br /&gt;The two plots below indicate the presence of unambiguous donor alleles in blue bars going to 1 (which defined the recombinant segments), and the presence of unambiguous mutant alleles in red bars going to -1. (Click to enlarge)&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/SxbkNAURMrI/AAAAAAAAAcU/FIlfNK9344o/s1600-h/TfAmut.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 99px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/SxbkNAURMrI/AAAAAAAAAcU/FIlfNK9344o/s400/TfAmut.png" alt="" id="BLOGGER_PHOTO_ID_5410762914406675122" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/SxbkR5j1dOI/AAAAAAAAAcc/tJMbkX7Fao8/s1600-h/TfBmut.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 99px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/SxbkR5j1dOI/AAAAAAAAAcc/tJMbkX7Fao8/s400/TfBmut.png" alt="" id="BLOGGER_PHOTO_ID_5410762998492263650" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;That looks pretty striking!  Mutations are clearly clustered into the recombined segments!&lt;br /&gt;&lt;br /&gt;A few of the "novel alleles" in the two genomes are shared.  4 of 6 are in the first overlapping donor segment, and the other two are outside the donor segments.  It is still early to be too confident in this result, but still!  It is very suggestive.&lt;br /&gt;&lt;br /&gt;I’ve never really taken the supposed causal connection between recombination and mutation too seriously, since the evidence mostly seems correlative to me, but if this result holds up, I think it will be a uniquely clear-cut example of mutations induced by recombination.&lt;br /&gt;&lt;br /&gt;Before being confident in the result, I need to map the data back to the donor genome, and cross-check the result.  If that works, some simple PCR and traditional sequencing should readily confirm or refute this deep sequencing result.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-3868189498708514812?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/3868189498708514812/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/12/mutagenic-recombination.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/3868189498708514812'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/3868189498708514812'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/12/mutagenic-recombination.html' title='Mutagenic recombination?'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_7qRGxl6StM4/Sxbg9LKx6aI/AAAAAAAAAcM/os3zF3sAzzg/s72-c/fig20_00.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-1135999568908471354</id><published>2009-11-30T21:08:00.001-08:00</published><updated>2009-11-30T22:02:35.011-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='recombination'/><category scheme='http://www.blogger.com/atom/ns#' term='sequencing'/><title type='text'>Recombinant Genomes:  First Pass</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.pnas.org/content/101/12.cover-expansion"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 307px; height: 320px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SxSlY3Ap21I/AAAAAAAAAbc/eUMhF0IYRCw/s320/F1.medium.gif" alt="" id="BLOGGER_PHOTO_ID_5410130898880944978" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;We’ve now got deep sequencing data of donor, recipient, and transformant genomes.  And it is, indeed, “deep”, at least in quantity.  &lt;a href="http://nodnacontrol.blogspot.com/2009/10/transformants-produced.html"&gt;Here’s&lt;/a&gt; the post where I &lt;a href="http://3.bp.blogspot.com/_7qRGxl6StM4/SuIfVkA4H4I/AAAAAAAAAYE/r6Anc47G4C4/s1600-h/transf.png"&gt;illustrate&lt;/a&gt; the details of how the sequenced DNA was obtained.  There were five “lanes” of sequencing obtained, of which I will talk about the first four here:&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;Lane 1: Rd (the recipient genome)&lt;br /&gt;&lt;br /&gt;Lane 2: 1350NN (the donor genome 86-028NP plus NovR and NalR alleles of &lt;span style="font-style: italic;"&gt;gyrB&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;gyrA&lt;/span&gt;, respectively)&lt;br /&gt;&lt;br /&gt;Lane 3: Transformant A (the genome of a NovR transformed clone)&lt;br /&gt;&lt;br /&gt;Lane 4: Transformant B (the genome of a NalR transformed clone)&lt;br /&gt;&lt;br /&gt;So I’ve done a first-pass analysis of the sequencing data using &lt;a href="http://nodnacontrol.blogspot.com/2009/10/sets-of-snps.html"&gt;Maq&lt;/a&gt; as the mapping algorithm.  I left everything at default settings, and so far have only analyzed the datasets with respect to the recipient genome, Rd.&lt;br /&gt;&lt;br /&gt;The expectation is that when I map Lane 1 to Rd, there will be few differences detected by Maq; when Lane 2 is mapped to Rd, the bulk of SNPs between Rd and 86-028NP will be detected; and when Lanes 3 and 4 are mapped to Rd, the donor DNA segments transformed into the recipient background will be identified.  For lanes 3 and 4, since we know the locus that we selected for donor information, we have controls for whether or not an appropriate donor segment was detected.&lt;br /&gt;&lt;br /&gt;Before continuing, I must specify that this is a very preliminary analysis:  I do not consider the quality of the SNP calls, beyond whatever Maq does to make the calls (and culling ambiguous SNPs, as described below).  I have not mapped anything with respect to the donor genome.  I have not considered any polymorphism between the strains other than simple single-nucleotide substitutions (since Maq only does ungapped alignment). I also missed regions of very high SNP density, since the Maq default will not map any read with &gt;2 mismatches from the reference.  Finally, I have only cursorily examined depth of coverage across each genome (it is ~500-fold on average, ranging from ~100 to ~1000 over each dataset).&lt;br /&gt;&lt;br /&gt;However, even with these caveats, the approximate sizes and locations of transformed DNA segments were pretty clear...&lt;br /&gt;&lt;br /&gt;Here are the number of “SNPs” called by Maq between each dataset and Rd:&lt;br /&gt;&lt;br /&gt;Lane 1:          933 (Rd)&lt;br /&gt;Lane 2: 30,284 (1350NN)&lt;br /&gt;Lane 3:    1,870 (TfA-NovR)&lt;br /&gt;Lane 4:     1,881 (TfB-NalR)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;Rd versus Rd&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The first obvious issue is that when the Rd DNA was mapped against the Rd genome sequence, Maq called 933 “SNPs”.  What is all this supposed “variation”?&lt;br /&gt;&lt;br /&gt;108 had “N” as the reference base&lt;br /&gt;432 had an ambiguous base as the query&lt;br /&gt;&lt;br /&gt;So 540 / 933 “SNPs” are easily explained artifacts-- either ambiguous positions in the complete Rd genome sequence reference, or ambiguous base calls by Maq from the Illumina GA2 dataset.&lt;br /&gt;&lt;br /&gt;The remaining “SNPs” may also be persistent sequencing/mapping artifacts, or they may be true genetic differences between the “Rd” we sequenced (our lab’s wild type) and the original DNA sample sequenced back in the 1990s.&lt;br /&gt;&lt;br /&gt;To simplify matters I culled any position called a “SNP” between Rd and Rd, as well as all other ambiguous positions, from the remaining datasets.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;Rd versus 1350NN&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Before turning to the tranformants, I used Lane 2 to make a list of detectable SNP positions between the donor DNA and the recipient chromosome.  Of the 30,284 “SNPs” detected by Maq, 29,002 were neither in the Rd SNP set nor had an ambiguous base in either the reference or query.&lt;br /&gt;&lt;br /&gt;Note that I am not using SNPs identified by comparison of the two complete genome sequences, but rather those that were unambiguously determined by this sequencing experiment.  Rd remains the only reference genome I have used.&lt;br /&gt;&lt;br /&gt;I used this set of SNPs as the set of “Donor-specific alleles” to map transforming DNA segments.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;The transformants&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;To identify donor-specific alleles in the transformants, I took the intersection of the “Donor-specific alleles” and the unambiguous SNPs identified by Maq for the two transformants, yielding the following number of SNPs in each transformant:&lt;br /&gt;&lt;br /&gt;TfA: 890&lt;br /&gt;TfB: 975&lt;br /&gt;&lt;br /&gt;This suggests that about 3.0-3.5% of each transformant genome consists of donor DNA.  This value is consistent with what we might expect, based on the co-transformation frequency of the two donor markers into the recipient genome when I did the original transformation that produced the sequenced clones.&lt;br /&gt;&lt;br /&gt;Here are plots of the TfA and TfB genomes (genotype 0 = recipient allele, genotype 1 = donor allele):&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/SxSmSGgEzhI/AAAAAAAAAbk/oTcPOrSteI0/s1600/TfAdonated.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 90px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SxSmSGgEzhI/AAAAAAAAAbk/oTcPOrSteI0/s320/TfAdonated.png" alt="" id="BLOGGER_PHOTO_ID_5410131882291809810" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/SxSmV1Db3fI/AAAAAAAAAbs/sM14bf0-xWE/s1600/TfBdonated.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 90px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/SxSmV1Db3fI/AAAAAAAAAbs/sM14bf0-xWE/s320/TfBdonated.png" alt="" id="BLOGGER_PHOTO_ID_5410131946327760370" border="0" /&gt;&lt;/a&gt;(Note that images can be enlarged by clicking)&lt;br /&gt;&lt;br /&gt;This makes me happy.  I was kind of hoping our thinking was wrong and that there’d be all kinds of kookiness going on, but in many ways, having our expectations met is a vastly better situation, since it means that the designs of our other planned experiments are probably sound.&lt;br /&gt;&lt;br /&gt;Note that the right-most "donor segment" represents only a single "donor-specific allele" that is identical in both recombinants, as well as being surrounded by Rd vs. Rd SNPs.  It is highly likely that this singleton is an artifact.  All other donor segments are supported by many SNPs, including the overlapping segments ~200,000 bp.  This latter shared segment may suggest a hotspot of recombination.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Control loci&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The following two plots zoom in on the segments containing the control selected loci.  The first has red lines bounding the &lt;span style="font-style: italic;"&gt;gyrB&lt;/span&gt; gene in TfA, while the second has blue lines bounding the &lt;span style="font-style: italic;"&gt;gyrA&lt;/span&gt; genome in TfB.  Also in each plot, the masked positions (those that were left out due to ambiguity or presence in the Rd vs. Rd comparison) are show (for the first, in orange; for the second, in grey):&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/SxSmfZ7tEsI/AAAAAAAAAb8/MILBffoDAOI/s1600/TfAzoom.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 71px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/SxSmfZ7tEsI/AAAAAAAAAb8/MILBffoDAOI/s320/TfAzoom.png" alt="" id="BLOGGER_PHOTO_ID_5410132110846268098" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/SxSmjDIXW2I/AAAAAAAAAcE/M5MreQcu5-U/s1600/TfBzoom.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 90px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/SxSmjDIXW2I/AAAAAAAAAcE/M5MreQcu5-U/s320/TfBzoom.png" alt="" id="BLOGGER_PHOTO_ID_5410132173444832098" border="0" /&gt;&lt;/a&gt;Within each marker gene, there are a few recipient-specific SNPs.  These have to do with the fact that the PCR fragment I used to add the NovR and NalR alleles of &lt;span style="font-style: italic;"&gt;gyrB&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;gyrA&lt;/span&gt; contained recipient SNPs and some of these ended up in the donor genome.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Okay!  That’s almost it for now.  There’s a looong way to go, but happily I suspect that there is indeed biology to learn even from these “meagre” couple of genome sequences.&lt;br /&gt;&lt;br /&gt;My next task will be to account better for where I am blind.  I used stringent criteria to determine allele identity here.  I am quite confident in what was found, but I’m not sure sure about what I didn’t find.  That is, I suspect false negatives, but not false positives, based on how I’ve done this so far.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-1135999568908471354?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/1135999568908471354/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/11/recombinant-genomes-first-pass.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/1135999568908471354'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/1135999568908471354'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/11/recombinant-genomes-first-pass.html' title='Recombinant Genomes:  First Pass'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_7qRGxl6StM4/SxSlY3Ap21I/AAAAAAAAAbc/eUMhF0IYRCw/s72-c/F1.medium.gif' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-4461538188292852443</id><published>2009-11-24T16:11:00.001-08:00</published><updated>2009-11-24T16:18:05.902-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sequencing'/><title type='text'>Incoming Data Deluge</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.iaac.us/seventh_film_festival2007/BrijSondhi.htm"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 253px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/Swx2UmRXKnI/AAAAAAAAAbU/q_SR-eeqBL4/s320/Deluge.jpg" alt="" id="BLOGGER_PHOTO_ID_5407827348807953010" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;I am waiting with anticipation...  I am about to be drowning in data...  &gt;64 million quality filtered paired-end sequencing reads, constituting 5.4 Gigabases!  Woo!!&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;The files are currently transferring to an FTP site that  I will then download from.  Tomorrow afternoon, I may legitimately be a genomicist!&lt;br /&gt;&lt;br /&gt;Thanks to my friend and his tech for getting this done...&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-4461538188292852443?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/4461538188292852443/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/11/incoming-data-deluge.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/4461538188292852443'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/4461538188292852443'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/11/incoming-data-deluge.html' title='Incoming Data Deluge'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_7qRGxl6StM4/Swx2UmRXKnI/AAAAAAAAAbU/q_SR-eeqBL4/s72-c/Deluge.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-6632613580968981697</id><published>2009-11-20T10:19:00.000-08:00</published><updated>2009-11-20T10:46:40.924-08:00</updated><title type='text'>Recalculating</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/SwbjdNOrJUI/AAAAAAAAAa0/0lvfEPavNdQ/s1600/E%3Dmc%5E2.jpg"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 150px; height: 200px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SwbjdNOrJUI/AAAAAAAAAa0/0lvfEPavNdQ/s200/E%3Dmc%5E2.jpg" alt="" id="BLOGGER_PHOTO_ID_5406258493611779394" border="0" /&gt;&lt;/a&gt;On Wednesday, I put up some &lt;a href="http://nodnacontrol.blogspot.com/2009/11/saturation-and-chaser.html"&gt;tables of data &lt;/a&gt;with bad calculations of molecules per cell.  Rosie's concerns about the data were misplaced; they should've been concerns about my brain!&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;I'd been saying that 8 ng of 200 bp DNA / 200 ul of competent cells was ~ 1 molecule per cell.  But I was off.  Not just because I calculated it for pg instead of ng, but because I'd screwed up other parts of the math...&lt;br /&gt;&lt;br /&gt;Here's a more correct way to approximate the number of molecules per cell starting with one of Rosie's Universal Constants:&lt;br /&gt;&lt;br /&gt;(1e-18 g / one 1 kb DNA) * (one 200bp DNA / five 1kb DNA) =&lt;br /&gt;(2e-19 g / one 200 bp DNA) * (1e9 ng / 1 g) =&lt;br /&gt;2e-10 ng / one 200 bp DNA.&lt;br /&gt;So that's about how much a single 200 bp DNA molecule weighs.&lt;br /&gt;&lt;br /&gt;Then the inverse gives molecules per ng:&lt;br /&gt;1 / (2e-10 ng / one 200 bp DNA) =&lt;br /&gt;(5e+9 molecules DNA / ng DNA) * (8 ng) =&lt;br /&gt;4e+10 molecules DNA / 8 ng DNA.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Adding 8 ng of DNA to 2e+8 cells (in 200 ul of competent cells), then gives:&lt;br /&gt;200 molecules per cell...&lt;br /&gt;&lt;br /&gt;That's better.  And more in-line with the lab's previous work...&lt;br /&gt;I have changed the Tables in Wednesday's post to reflect this correction.&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-6632613580968981697?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/6632613580968981697/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/11/on-wednesday-i-put-up-some-tables-of.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/6632613580968981697'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/6632613580968981697'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/11/on-wednesday-i-put-up-some-tables-of.html' title='Recalculating'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_7qRGxl6StM4/SwbjdNOrJUI/AAAAAAAAAa0/0lvfEPavNdQ/s72-c/E%3Dmc%5E2.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-5468814856563460084</id><published>2009-11-19T17:08:00.000-08:00</published><updated>2009-11-19T17:12:16.540-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sequencing'/><title type='text'>Bacterial Genome Sequence Links</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://egcgolf.net/links"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 157px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SwXslsvqIlI/AAAAAAAAAas/3t8XbrPrUIg/s200/PebbleBeachAriel7th.81124621_std.jpg" alt="" id="BLOGGER_PHOTO_ID_5405987060138517074" border="0" /&gt;&lt;/a&gt;I keep meaning to put up these links, so I can find them in the future.  They get you to NCBI's genome FTP site.  If you click on the genome you want, you get a listing of a bunch of file formats.  For the complete genomes, the choice for the FastA of the complete sequence is the .fna file.  But there’s a bunch of others, which makes getting the correctly formatted data for application X easier:&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="ftp://ftp.ncbi.nih.gov/genomes/Bacteria/"&gt;Bacterial Genomes FTP&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;For the draft assemblies, the way to get the data is at this link:&lt;br /&gt;&lt;a href="ftp://ftp.ncbi.nih.gov/genbank/wgs/"&gt;Whole Genome Shotgun FTP&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;But to find out which accession you want is a little irritating and the best way is via searching this list:&lt;br /&gt;&lt;a href="http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi?view=2&amp;amp;p1=5:0&amp;amp;p2=2"&gt;Whole Genome Shotgun HTML&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Using the supplied links from this last list is not so useful for downloading the data as is the FTP link, since you would have to click a maddening amount to get all the contigs from a particular whole genome shotgun sequencing project.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-5468814856563460084?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/5468814856563460084/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/11/genome-sequence-links.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/5468814856563460084'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/5468814856563460084'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/11/genome-sequence-links.html' title='Bacterial Genome Sequence Links'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_7qRGxl6StM4/SwXslsvqIlI/AAAAAAAAAas/3t8XbrPrUIg/s72-c/PebbleBeachAriel7th.81124621_std.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-8797369652310909133</id><published>2009-11-18T17:03:00.000-08:00</published><updated>2009-11-20T10:52:45.919-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='USS'/><category scheme='http://www.blogger.com/atom/ns#' term='degenerate'/><title type='text'>Saturation and Chaser</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://help.adobe.com/en_US/Fireworks/10.0_Using/WS4c25cfbb1410b0021e63e3d1152b00cce0-7ff1.html"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 234px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/SwShi_X_luI/AAAAAAAAAaM/Aql21SN0TW8/s320/bi_saturation.png" alt="" id="BLOGGER_PHOTO_ID_5405623075251132130" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;EDITED:  Used bad mol per cell calculation earlier.&lt;br /&gt;&lt;br /&gt;How much DNA does an average cell in a competent culture take up?  I did two experiments today, which were aimed at trying to decide how much degeneracy we can tolerate in our degenerate USS experiment.  There's several issues, but one simple one is just to know how much of an "optimal" USS is taken up in the presence of competing "suboptimal" USS... &lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;In the first experiment, I added different amounts of hot USS-1 DNA (the new-fangled ones designed for Illumina sequencing... ~200 bp) to a fixed amount of competent cells to find out how much DNA was enough to saturate the cells with DNA.  Here's the results for several amounts of USS-1 incubated with 200 ul of competent cells for 20 minutes:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/SwShs0p0v3I/AAAAAAAAAaU/qYC0k3mlvI8/s1600/satcurve.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 315px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/SwShs0p0v3I/AAAAAAAAAaU/qYC0k3mlvI8/s320/satcurve.png" alt="" id="BLOGGER_PHOTO_ID_5405623244171820914" border="0" /&gt;&lt;/a&gt;Looks like less than 8 ng (or ~200 molecule per cell) is sub-saturating and  greater than 8 ng is near saturation.&lt;br /&gt;&lt;br /&gt;In the second experiment, I looked at how well USS-1 is taken up in the presence of a competitor mutant USS (USS-V6, which is taken up 10X worse than USS-1).  Based on the results of the first experiment, I used two different amounts of hot USS-1, either sub-saturating (4ng) or saturating (16ng).  For each of these, I had four samples in which different amounts of cold USS-V6 were added.  Here's the results for subsaturating USS-1:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/SwbkiGvvqxI/AAAAAAAAAbE/lbCgd_igt58/s1600/subsat.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 140px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SwbkiGvvqxI/AAAAAAAAAbE/lbCgd_igt58/s320/subsat.png" alt="" id="BLOGGER_PHOTO_ID_5406259677282413330" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;And now for saturating USS-1:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/SwbklWHoSNI/AAAAAAAAAbM/UX2LyBLCeJo/s1600/sat.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 140px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/SwbklWHoSNI/AAAAAAAAAbM/UX2LyBLCeJo/s320/sat.png" alt="" id="BLOGGER_PHOTO_ID_5406259732948732114" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Though adding the competitor did reduce uptake, uptake decreased only by ~1/4, even when there was several-fold more competitor.  Also, it appears that using saturating amounts of USS-1 yields a different decline in uptake with excess inhibitor than when USS-1 is sub-saturating.&lt;br /&gt;&lt;br /&gt;So what's this all mean?  I think it means that even when "optimal" USS are only 1/10 as common as "suboptimal" USS, uptake proceeds quite nicely.  What does this mean practically for making our fancy oligo purchase?  Not too much, except that I think we can err on the side of more degeneracy without too many problems.  (This will be good for several reasons, probably the most important of which is that our preliminary data collection plans will work best with more degeneracy.)&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-8797369652310909133?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/8797369652310909133/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/11/saturation-and-chaser.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/8797369652310909133'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/8797369652310909133'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/11/saturation-and-chaser.html' title='Saturation and Chaser'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_7qRGxl6StM4/SwShi_X_luI/AAAAAAAAAaM/Aql21SN0TW8/s72-c/bi_saturation.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-1191751378819443997</id><published>2009-11-17T17:39:00.000-08:00</published><updated>2009-11-17T17:57:07.953-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='USS'/><category scheme='http://www.blogger.com/atom/ns#' term='degenerate'/><title type='text'>USS uptake with Illumina adaptors</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.fotosearch.com/IMZ353/vmo0960/"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 300px; height: 312px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/SwNS8M9HgXI/AAAAAAAAAaE/IeD4YOVU3os/s320/montage-illustration-about_%7Evmo0960.jpg" alt="" id="BLOGGER_PHOTO_ID_5405255171997925746" border="0" /&gt;&lt;/a&gt;(I'm somehow deeply amused by the above image.  I think it has something to do with the arrow going from the DNA molecule to the sun.)&lt;br /&gt;&lt;br /&gt;I've finally gotten around to doing some uptake experiments with the control sequences I got for our degenerate USS experiments.  The important thing was to make sure that the constructs would behave as our older ones do with our &lt;a href="http://nodnacontrol.blogspot.com/2009/09/reverse-engineering.html"&gt;new-fangled design&lt;/a&gt; that will allow for Illumina sequencing directly from the purified periplasmic DNA (i.e. no library construction steps)...&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;And it worked quite well!  I also scaled these experiments down, so that I could more easily do some saturation curves tomorrow.&lt;br /&gt;&lt;br /&gt;Results today for uptake of 8 ng of 200 bp DNA (~160 million molecules) by ~200 million competent cells (in 200 ul):&lt;br /&gt;&lt;br /&gt;O.G.-USS1     -&gt;  33.5%&lt;br /&gt;New-USS1     -&gt;  35.4%&lt;br /&gt;New-USSV6  -&gt;    3.7%&lt;br /&gt;New-USSR    -&gt;    0.7%&lt;br /&gt;&lt;br /&gt;Right on target!&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-1191751378819443997?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/1191751378819443997/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/11/uss-uptake-with-illumina-adaptors.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/1191751378819443997'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/1191751378819443997'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/11/uss-uptake-with-illumina-adaptors.html' title='USS uptake with Illumina adaptors'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_7qRGxl6StM4/SwNS8M9HgXI/AAAAAAAAAaE/IeD4YOVU3os/s72-c/montage-illustration-about_%7Evmo0960.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-8249141275405884991</id><published>2009-11-17T17:36:00.000-08:00</published><updated>2009-11-17T17:38:50.139-08:00</updated><title type='text'>Missing Posts</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.artomatic.org/node/3934"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 246px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SwNP5K2J2qI/AAAAAAAAAZ8/rskegRLXJro/s320/03934_fallingbehind.jpg" alt="" id="BLOGGER_PHOTO_ID_5405251821357357730" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Whew!  I keep falling behind on my blogging.... Here are the posts that I haven’t written:&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;Using R:&lt;br /&gt;&lt;br /&gt;(1) Simulation of degenerate USS experiment, assuming the USS motif defines uptake specificity.&lt;br /&gt;&lt;br /&gt;(2) Scoring genomes for USS sites.&lt;br /&gt;&lt;br /&gt;(3) Scoring alignments for USS sites.  (This last bit is finally begun, but far from complete.)&lt;br /&gt;&lt;br /&gt;I've got the figures for these posts; now I just need text!&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-8249141275405884991?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/8249141275405884991/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/11/missing-posts.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/8249141275405884991'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/8249141275405884991'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/11/missing-posts.html' title='Missing Posts'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_7qRGxl6StM4/SwNP5K2J2qI/AAAAAAAAAZ8/rskegRLXJro/s72-c/03934_fallingbehind.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-1591463533947286081</id><published>2009-11-06T16:59:00.001-08:00</published><updated>2009-11-06T17:06:26.882-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='computers'/><category scheme='http://www.blogger.com/atom/ns#' term='degenerate'/><category scheme='http://www.blogger.com/atom/ns#' term='DNA'/><title type='text'>Weening myself off of Excel</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://seqcore.brcf.med.umich.edu/doc/educ/dnapr/pg1.html"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 120px; height: 200px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SvTGm491I_I/AAAAAAAAAZc/LbEfxElmm9I/s200/oligo.gif" alt="" id="BLOGGER_PHOTO_ID_5401160224553116658" border="0" /&gt;&lt;/a&gt;In some sense, the computing I did today isn’t really useful, since I already worked out these things using Microsoft Excel.  But I’ve been ordered by my bioinformatics consultants to stop with the Excel already.  So as practice, I worked out &lt;a href="http://nodnacontrol.blogspot.com/2009/05/degenerate-oligos-ii.html"&gt;some of the expected features of degenerate oligos again&lt;/a&gt;, but this time using R.&lt;br /&gt;&lt;br /&gt;The main motivation for doing this besides practice is that I am fairly sure we should be ordering degenerate oligos with more degeneracy than we have &lt;a href="http://nodnacontrol.blogspot.com/2009/05/rosie-and-i-have-been-working-our-way.html"&gt;previously considered&lt;/a&gt;.  I won't make that argument here, but just repeat some analytical graphs I'd previously made.&lt;br /&gt;&lt;br /&gt;It took a while (since I’m learning), but was still much more straight-forward than doing it in a spreadsheet.  The exercise was extremely useful, as I learned a bunch of stuff (especially about plots in R), while doing the following:&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;Problem #1:  Given a percentage of degeneracy per base, d, in an n length oligo, what is the proportion of oligos with k mismatches?&lt;br /&gt;Answer #1: Use the binomial distribution.  For a 32mer with different levels of degeneracy (shown in legend):&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/SvTHiUBh3HI/AAAAAAAAAZk/NNj8UCHhob8/s1600-h/binom.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 320px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/SvTHiUBh3HI/AAAAAAAAAZk/NNj8UCHhob8/s320/binom.png" alt="" id="BLOGGER_PHOTO_ID_5401161245428669554" border="0" /&gt;&lt;/a&gt;Problem #2:  Given a million instances of such an oligo, how well would each possible oligo with k mismatches be observed?&lt;br /&gt;Answer #2:  Simply adjust each of the above values by dividing the number of classes within each of k mismatches (i.e. choose(n, k)):&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/SvTHmrjRJMI/AAAAAAAAAZs/_EB5gvqt8Y0/s1600-h/cover.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 320px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/SvTHmrjRJMI/AAAAAAAAAZs/_EB5gvqt8Y0/s320/cover.png" alt="" id="BLOGGER_PHOTO_ID_5401161320463672514" border="0" /&gt;&lt;/a&gt;Problem #3:  If some number of bases, m, in the n-length oligo are “important”, what proportion of oligos with k mismatches will have x “hits”?&lt;br /&gt;Answer #3:  Use the hypergeometric distribution.  The below plot is as for Problem #1 for 0.12 degeneracy, but with the # of hits broken down for each k:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/SvTHqt-7beI/AAAAAAAAAZ0/NKdTAKvOYtU/s1600-h/hyper.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 320px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SvTHqt-7beI/AAAAAAAAAZ0/NKdTAKvOYtU/s320/hyper.png" alt="" id="BLOGGER_PHOTO_ID_5401161389836037602" border="0" /&gt;&lt;/a&gt;I didn't try super-hard to make the perfect graphs, but it did take some effort to make a stacked bar plot...&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-1591463533947286081?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/1591463533947286081/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/11/weening-myself-off-of-excel.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/1591463533947286081'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/1591463533947286081'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/11/weening-myself-off-of-excel.html' title='Weening myself off of Excel'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_7qRGxl6StM4/SvTGm491I_I/AAAAAAAAAZc/LbEfxElmm9I/s72-c/oligo.gif' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-6152049167633071105</id><published>2009-11-06T10:45:00.001-08:00</published><updated>2009-11-06T15:13:23.622-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='grants'/><title type='text'>Funding versus Science</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.chrismadden.co.uk/cartoons/science-cartoons/science-cartoons-select.html"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 320px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SvSrAgDeAmI/AAAAAAAAAZU/INA75jKf-aM/s320/scientific-research-grant-application-cartoon.gif" alt="" id="BLOGGER_PHOTO_ID_5401129878216901218" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Yesterday, I ended up doing none of the things I intended to do, but cleaned them instead:  (1) No bench-work, but a clean bench (and fridge and freezer spaces)!; (2) No emails, but a clean inbox!; (3) No thinking, but a clean mind! (Well, moreso than usual.)&lt;br /&gt;&lt;br /&gt;My ears were also itching, because I realized that the NIH panel that's review my postdoc grant application was meeting, and was extra-worried, since now the Genome BC grant almost certainly will depend on the outcome of those scores.&lt;br /&gt;&lt;br /&gt;Anyways, towards the end of the day, I switched over to browsing journals, which I rarely do these days...&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;I found a lot of good stuff I probably already should've known about, but I also found &lt;a href="http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1000197"&gt;this opinion&lt;/a&gt; piece in PLoS about "what's wrong with funding of research".  Pretty much sounds about right.  I am not even in a position to have the stresses described in the article, but feel like if I don't get some kind of postdoc fellowship of my own, I'll be that much less likely to get hired as an independent researcher, since so much of what defines success is the ability to get money.  But all of the PIs I know are overwhelmed by exactly these issues.  It's especially daunting to realize that a full-sized NIH R01 grant can barely support a lab with 2-3 people.&lt;br /&gt;&lt;br /&gt;On the other hand, my recent experiences with Rosie do bring home the fact that grant-writing is a good way to think rigorously about one's plans, and the Genome BC experience in particular (whether the grant is funded or not) helped our brains wrap around exactly what we'll be doing in the next couple of months. &lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-6152049167633071105?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/6152049167633071105/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/11/funding-versus-science.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/6152049167633071105'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/6152049167633071105'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/11/funding-versus-science.html' title='Funding versus Science'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_7qRGxl6StM4/SvSrAgDeAmI/AAAAAAAAAZU/INA75jKf-aM/s72-c/scientific-research-grant-application-cartoon.gif' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-6066775510476547488</id><published>2009-11-04T13:30:00.001-08:00</published><updated>2009-11-04T17:05:21.357-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='USS'/><category scheme='http://www.blogger.com/atom/ns#' term='sequencing'/><category scheme='http://www.blogger.com/atom/ns#' term='periplasm'/><category scheme='http://www.blogger.com/atom/ns#' term='grants'/><title type='text'>Decompression</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.explainthatstuff.com/scubadiving.html"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 312px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SvIjpFb70zI/AAAAAAAAAZM/cG7f5rXsxUk/s320/decompression.jpg" alt="" id="BLOGGER_PHOTO_ID_5400418091911598898" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Whew!  Rosie and I just made what I consider a heroic effort to produce a grant application to &lt;a href="http://www.genomebc.ca/"&gt;Genome BC&lt;/a&gt; to use DNA sequencing to measure recombination biases during &lt;span style="font-style: italic;"&gt;H. influenzae&lt;/span&gt; natural transformation.&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;It was heroic not only because we finally decided to apply only late last week, but because our co-funding support is tenuous at best.  Genome BC requires that we match their funds at least equally with funds from another source.  Rosie has funding, but it was applied for too long ago and the proposal only indirectly relates to our planned sequencing.  We also applied for a CIHR grant recently, but will not have reviews until after the Genome BC committee meets in early January.  The best hope of adequate co-funding comes from my NIH postdoctoral fellowship grant application (a resubmission) late this summer, for which I should have scores (or lack thereof) within a couple of weeks.  If the application gets a good score, we can tell Genome BC that the major threat to the success of our application is ameliorated.&lt;br /&gt;&lt;br /&gt;Almost immediately after submitting the grant application with Rosie last night, I had to turn to editing my buddy's manuscript (which I am an author on), which takes on the weighty topic of detecting rearrangements in complex mammalian genomes from limited sequencing data.  I just turned my edits over to the corresponding author, and now,&lt;a href="http://en.wikipedia.org/wiki/Decompression_stop"&gt; after letting the excess nitrogen out of my bloodstream&lt;/a&gt;, I need to decide what to do next.&lt;br /&gt;&lt;br /&gt;Since aforementioned buddy also has taken &lt;a href="http://nodnacontrol.blogspot.com/2009/10/transformants-produced.html"&gt;several DNA samples off my hands for sequencing&lt;/a&gt;, I think I'd best turn to purifying uptake DNA from the periplasm of competent cells.  I've already gotten things fairly well under way (see &lt;a href="http://nodnacontrol.blogspot.com/2009/09/scale-up.html"&gt;here&lt;/a&gt;, &lt;a href="http://nodnacontrol.blogspot.com/2009/09/reverse-engineering.html"&gt;here&lt;/a&gt;, and &lt;a href="http://nodnacontrol.blogspot.com/2009/09/eating-chromosomal-dna-fragments.html"&gt;here&lt;/a&gt;), but there's a bunch of uptake experiments waiting to be done.  So tonight, I'll inoculate some cultures, so I can try a large-scale periplasmic DNA prep tomorrow, and tomorrow I'll also order some more radiolabel for doing more sensitive uptake experiments.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-6066775510476547488?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/6066775510476547488/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/11/decompression.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/6066775510476547488'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/6066775510476547488'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/11/decompression.html' title='Decompression'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_7qRGxl6StM4/SvIjpFb70zI/AAAAAAAAAZM/cG7f5rXsxUk/s72-c/decompression.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-1750532667441645087</id><published>2009-10-27T13:13:00.000-07:00</published><updated>2009-10-27T13:27:03.937-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='computers'/><category scheme='http://www.blogger.com/atom/ns#' term='alignment'/><title type='text'>Sets of Snps</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://science.marshall.edu/murraye/341/snps/Human%20Genetics%20MTHFR%20SNP%20Page.html"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 160px; height: 200px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SudUoV9RtOI/AAAAAAAAAYc/EDze6lWhmVk/s200/416px-Dna-SNP_svg.png" alt="" id="BLOGGER_PHOTO_ID_5397375730492486882" border="0" /&gt;&lt;/a&gt;So I’ve got pure DNA from my transformants, pretty much ready to send off for our first Illumina runs.  I’m just doing a few simple checks with PCR and digestion to make sure everything is kosher.&lt;br /&gt;&lt;br /&gt;But there is this little fear in my head about what I’ll do when I get the massive datasets.  I got a hold of some example Illumina GA2 paired-end data taken from E. coli K12.  Since I don’t have any data of my own yet from H. influenzae, this seemed like a good dataset to start learning how to do the initial data analysis.&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;I decided to go with the most widely-used reference-mapping software, called “&lt;a href="http://maq.sourceforge.net/index.shtml"&gt;Maq&lt;/a&gt;” for “Mapping and Alignment with Quality”.  I can see why it’s widely-used; it is a breeze to install and use, which is the primary requirement for end-users to like a given piece of software.  I’ve started dealing with just single-end reads from only one lane.  The data is several years old, so there are “only” a few million reads of 35 base-pairs.  Nevertheless, this represents nearly 20X sequence coverage of the E. coli K12 genome.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://maq.sourceforge.net/maqview.shtml"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 217px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/SudUwKm3ERI/AAAAAAAAAYk/GHwzpEn6QK0/s320/maqview-F2.png" alt="" id="BLOGGER_PHOTO_ID_5397375864884629778" border="0" /&gt;&lt;/a&gt;I’ll keep the Maq overview brief, since I went through it in lab meeting and Maq’s documentation is largely quite good (with the caveat that there are some challenges interpreting the meaning of all the columns in the different outputs).  In short, Maq takes a “&lt;a href="http://maq.sourceforge.net/fastq.shtml"&gt;FastQ&lt;/a&gt;” file (which is the sequence data, including quality scores) and a “&lt;a href="http://en.wikipedia.org/wiki/FASTA_format"&gt;FastA&lt;/a&gt;” file (the sequence reference), converts them into a binary format (which is apparently helpful for making the mapping algorithm fast), and then maps individual reads to the reference, allowing up to 2 mismatches.  The “pile-up” at each position is used to decide the “consensus base” at that position, based on the majority base and associated quality scores.  The mappings can be extracted from the binary output using additional commands.&lt;br /&gt;&lt;br /&gt;Here, I’ll focus on the .snps output (obtained by cns2snp... from the &lt;a href="http://maq.sourceforge.net/maq-manpage.shtml"&gt;man&lt;/a&gt; page), since this will be the most straight-forward way for us to define recombinant segments in our transformants.  I’m keeping things simple still, so there are a lot of other issues I could get tangled up in, other than the ones I’ll discuss here.&lt;br /&gt;&lt;br /&gt;So I “Maqed” this one FastQ dataset from E. coli K12 (I’m calling it s11) against two different reference sequences:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.ncbi.nlm.nih.gov/nuccore/49175990"&gt;K12&lt;/a&gt;  -&gt; NC_000913.fna&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.ncbi.nlm.nih.gov/nuccore/15829254"&gt;O57&lt;/a&gt; -&gt; NC_002695.fna&lt;/li&gt;&lt;/ul&gt;The first is a sort of control for how the consensus base caller is working.  Since the sequencing is from K12 DNA, most consensus bases called by Maq should match the reference.  Occasionally some “SNPs” might appear.  These could have several sources:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Spurious due to low coverage and high error&lt;/li&gt;&lt;li&gt;Actual mutations between the original completely sequenced K12 and the K12 used for the Illumina sequencing&lt;/li&gt;&lt;li&gt;Clonal variation where the consensus-calling favors the variant sequence.&lt;/li&gt;&lt;/ol&gt;The second is to call SNPs between the two strains.  I would expect that the number of SNPs called against O57 would vastly exceed that of K12.&lt;br /&gt;&lt;br /&gt;This is easily decided with a little UNIX command on the two files to count the lines (which each correspond to a SNP call):&lt;br /&gt;&lt;blockquote&gt;&gt; &lt;a href="http://unixhelp.ed.ac.uk/CGI/man-cgi?wc"&gt;wc&lt;/a&gt; -l K12s11.snp O57s11.snp&lt;br /&gt;6051 K12s11.snp&lt;br /&gt;64217 O57s11.snp&lt;/blockquote&gt;Indeed, there are 10X more SNPs running s11 against O57 than against K12.  The several thousand SNPs called against K12 are likely mostly errors.  Maq doesn’t always assign the consensus base with A, C, G, or T, but with any of the other &lt;a href="http://www.bio-soft.net/sms/iupac.html"&gt;IUPAC nucleotide codes&lt;/a&gt;, so many of these “SNPs” are probably quite low confidence.&lt;br /&gt;&lt;br /&gt;That’s all fine and good, but now what?  How can I tell how well this s11 dataset did at calling all the SNPs between K12 and O57?  I resorted to using &lt;a href="http://mummer.sourceforge.net/"&gt;Mummer&lt;/a&gt;’s Dnadiff program (previously discussed &lt;a href="http://nodnacontrol.blogspot.com/2009/05/enumerating-their-differences.html"&gt;here&lt;/a&gt;) to compare the two reference sequences to each other and extract the SNPs.  If the s11 dataset really did a good job sequencing the E. coli genome, there should be a strong overlap between the SNPs called by Mummer and those called by Maq.&lt;br /&gt;&lt;br /&gt;(Here’s a GenomeMatcher DotPlot run with Mummer)&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/SudWNj-MC9I/AAAAAAAAAYs/HMdFLph0FNs/s1600-h/mum.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 320px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SudWNj-MC9I/AAAAAAAAAYs/HMdFLph0FNs/s320/mum.png" alt="" id="BLOGGER_PHOTO_ID_5397377469421194194" border="0" /&gt;&lt;/a&gt;Again, UNIX came to the rescue, thanks to &lt;a href="http://www.catonmat.net/blog/set-operations-in-unix-shell/"&gt;this page I found &lt;/a&gt;that provides UNIX commands for working with “sets”.&lt;br /&gt;&lt;br /&gt;First, I had to get the appropriate three columns from the two SNP files:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;the reference genome position (in O57)&lt;/li&gt;&lt;li&gt;the reference base (also from O57)&lt;/li&gt;&lt;li&gt;the query base (the SNP in K12; the consensus base call for Maq or the SNP call for Mummer).&lt;/li&gt;&lt;/ol&gt;Here's how I made those files:&lt;br /&gt;&lt;blockquote&gt;&gt; &lt;a href="http://www.manpagez.com/man/1/cut/"&gt;cut&lt;/a&gt; -f 2 -f 3 -f 4 O57s11.snp &gt; maqSnp.txt&lt;br /&gt;&gt; cut -f 1 -f 2 -f 3 O57vsK12.snps &gt; mumSnp.txt&lt;br /&gt;&lt;/blockquote&gt;This provided me with my two sets, one called maqSnp.txt and the other mumSnp.txt.  Here’s their lengths:&lt;br /&gt;&lt;blockquote&gt;&gt; wc -l maqSnp.txt mumSnp.txt&lt;br /&gt;64217 maqSnp.txt&lt;br /&gt;76111 mumSnp.txt&lt;/blockquote&gt;Notably, Mummer called many more SNPs than Maq.  I think this is largely because the SNP output from Mummer includes single-nucleotide indels, which Maq misses since is does an ungapped alignment.  I’m not sure how to deal with this, but in our real experiments, they should still be discoverable, since we’ll map our reads to both donor and recipient genomes.  Also, there are numerous “SNPs” in the Maq file that are non-A, C, G, T consensus bases, which will largely be absent from the Mummer comparison.&lt;br /&gt;&lt;br /&gt;So, then it was a piece-of-cake to count the SNPs in the intersection of the two sets.  Indeed there were several ways to do it, my favorite of which was:&lt;br /&gt;&lt;blockquote&gt;&gt; &lt;a href="http://unixhelp.ed.ac.uk/CGI/man-cgi?grep"&gt;grep&lt;/a&gt; -xF -f maqSnp.txt mumSnp.txt | wc -l&lt;br /&gt;57179&lt;/blockquote&gt;So:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bioinfoman.com/free/bxarrays/venndiagram.php"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 246px; height: 320px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/SudXDY4ihTI/AAAAAAAAAY0/CX0TmDuE4Xs/s320/venndiagram_display.png" alt="" id="BLOGGER_PHOTO_ID_5397378394157647154" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;That’s a whole lotta overlap!  Happy!  When I get the real transformant data, this will help me tremendously.&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-1750532667441645087?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/1750532667441645087/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/10/sets-of-snps.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/1750532667441645087'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/1750532667441645087'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/10/sets-of-snps.html' title='Sets of Snps'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_7qRGxl6StM4/SudUoV9RtOI/AAAAAAAAAYc/EDze6lWhmVk/s72-c/416px-Dna-SNP_svg.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-4269680324365177318</id><published>2009-10-23T14:20:00.000-07:00</published><updated>2009-10-23T14:31:34.226-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sequencing'/><category scheme='http://www.blogger.com/atom/ns#' term='DNA'/><category scheme='http://www.blogger.com/atom/ns#' term='lab'/><title type='text'>Transformants Produced!</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://www.delphigenetics.com/services.html?PHPSESSID=0a5639eca75ae605048ba511bd274f29"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 178px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/SuIeloQ7kbI/AAAAAAAAAXk/huBcWtwQeLk/s200/DelphiGenetics-services-sheep-2.gif" alt="" id="BLOGGER_PHOTO_ID_5395908935355699634" border="0" /&gt;&lt;/a&gt;I'll set aside the hack bioinformatics posts for now and give an update on my transformation experiments...  We’ve gotten access to a few lanes of Illumina GA2 sequencing for some preliminary studies, and right now I’m drying the genomic DNA samples that we plan to sequence.&lt;br /&gt;&lt;br /&gt;The notion is to sequence several independent transformants of &lt;span style="font-style: italic;"&gt;Haemophilus influenzae&lt;/span&gt; to get some idea of how much donor DNA taken up by cells finds its way into recipient chromosomes.  This pilot study will go a long way in informing our planned large-scale experiments and give us a chance to learn how to handle the data.&lt;br /&gt;&lt;br /&gt;Here’s what I did to produce the material...&lt;span class="fullpost"&gt;&lt;br /&gt;...some transformations, of course!&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/SuIfJ1GNsEI/AAAAAAAAAX8/SmO3dYaNgDo/s1600-h/Donor.png"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer; width: 122px; height: 320px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/SuIfJ1GNsEI/AAAAAAAAAX8/SmO3dYaNgDo/s320/Donor.png" alt="" id="BLOGGER_PHOTO_ID_5395909557275701314" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/SuIe0YVy0LI/AAAAAAAAAX0/-I9CzY9ZrRs/s1600-h/MAP7.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 119px; height: 320px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SuIe0YVy0LI/AAAAAAAAAX0/-I9CzY9ZrRs/s320/MAP7.png" alt="" id="BLOGGER_PHOTO_ID_5395909188779167922" border="0" /&gt;&lt;/a&gt;First, I PCR amplified the &lt;span style="font-style: italic;"&gt;gyrA&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;gyrB&lt;/span&gt; alleles from the MAP7 strain (which confer nalidixic acid and novobiocin resistance, respectively).  MAP7 is a derivative of our recipient strain KW20 containing several point mutation that confer antibiotic resistances.&lt;br /&gt;&lt;br /&gt;I used these PCR products to transform our donor strain 86-028NP to provide two selectable markers in the donor.  I’ve been calling this strain 1350NN.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Then I extracted DNA from this strain and used it as the donor DNA to transform KW20 competent cells.  By selecting for one or both markers, I can ensure that clones chosen for DNA extraction and sequencing were indeed derived from competent cells that got transformed.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/SuIfVkA4H4I/AAAAAAAAAYE/r6Anc47G4C4/s1600-h/transf.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 278px; height: 320px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SuIfVkA4H4I/AAAAAAAAAYE/r6Anc47G4C4/s320/transf.png" alt="" id="BLOGGER_PHOTO_ID_5395909758848343938" border="0" /&gt;&lt;/a&gt;Our baseline expectation is that there will be a large segment (10-50kb) of donor alleles in the transformants at selected sites and 2-3 additional large segments elsewhere in the genome.&lt;br /&gt;&lt;br /&gt;Originally, we were going to do this transformation with only a single marker, but we realized that having two would allow us to measure the frequency of co-transformation.&lt;br /&gt;&lt;br /&gt;Here’s what the transformation rates looked like:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/SuIgc-7pfmI/AAAAAAAAAYM/1j5Dfd0WlqA/s1600-h/tfRATES.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 200px; height: 177px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/SuIgc-7pfmI/AAAAAAAAAYM/1j5Dfd0WlqA/s200/tfRATES.png" alt="" id="BLOGGER_PHOTO_ID_5395910985844883042" border="0" /&gt;&lt;/a&gt;I used MAP7 DNA as a donor as a control.  Since MAP7 is more closely related to KW20 than 86-028NP, it is perhaps unsurprising that transformation rates were higher when using MAP7 as donor.&lt;br /&gt;&lt;br /&gt;As for co-transformation, here’s the frequency of double transformants versus expected:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/SuIginV4wfI/AAAAAAAAAYU/LufiTA1WB4w/s1600-h/doubs.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 200px; height: 187px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SuIginV4wfI/AAAAAAAAAYU/LufiTA1WB4w/s200/doubs.png" alt="" id="BLOGGER_PHOTO_ID_5395911082591699442" border="0" /&gt;&lt;/a&gt;That corresponds to ~25-35% of the cells in the competent cell preparation actually being competent.  I’ve been wracking my brain unsuccessfully trying to figure out how to do a back-of-theenvelope calculation as to how many independent molecules we expect to transform any given recipient.  I just can’t figure out a concise or reasonable way to do it.  Suffice it to say, I estimate a minimum of 20 kb of donor DNA in each transformant (1% of the genome), up to perhaps 100 kb (5% of the genome).&lt;br /&gt;&lt;br /&gt;There’s only one way to find out…&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-4269680324365177318?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/4269680324365177318/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/10/transformants-produced.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/4269680324365177318'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/4269680324365177318'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/10/transformants-produced.html' title='Transformants Produced!'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_7qRGxl6StM4/SuIeloQ7kbI/AAAAAAAAAXk/huBcWtwQeLk/s72-c/DelphiGenetics-services-sheep-2.gif' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-8931273483630756753</id><published>2009-10-15T11:48:00.001-07:00</published><updated>2009-10-15T12:01:58.703-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='computers'/><title type='text'>Computing Bootcamp</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://ontap.wordpress.com/2007/03/"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 190px; height: 320px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/StduqIogfRI/AAAAAAAAAXE/0mSZORERjrM/s320/lagunitas.jpg" alt="" id="BLOGGER_PHOTO_ID_5392900748950404370" border="0" /&gt;&lt;/a&gt;Whew, I’ve really fallen behind on my blogging...  Last week, a good friend of mine came into town for a “northern retreat”, in which he hoped to get work done on a paper.  Instead, he and I drank enormous amounts of beer and did an enormous amount of computing with the &lt;span style="font-style: italic;"&gt;Haemophilus infuenzae &lt;/span&gt;genome (at least by my standards).  While the beer probably didn’t help anything, the computing did.&lt;br /&gt;&lt;br /&gt;I’ll go over some of what we did in future posts, but right here I just want to outline some of the computing lessons I learned looking over his shoulder over the week.  Many of these lessons have been given to me before and are likely quite basic for the real computationalists out there, but somehow I’ve emerged from the computing emersion with a lot more competence and confidence than I had before...&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="fullpost"&gt;Here's three useful things I'm getting better at:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.codinghorror.com/blog/archives/000825.html"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 183px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/StdvFM7ANiI/AAAAAAAAAXM/59O6DEgkHo8/s200/no-mouse-allowed.jpg" alt="" id="BLOGGER_PHOTO_ID_5392901213958190626" border="0" /&gt;&lt;/a&gt;(1) &lt;span style="font-style: italic;"&gt;Avoid using the mouse&lt;/span&gt;.  The more that can be accomplished from the command line and using keystrokes, the better.  From the command line, &lt;a href="http://en.wikipedia.org/wiki/Command_line_completion"&gt;tab-completion&lt;/a&gt; and cursor-control of the &lt;a href="http://en.wikipedia.org/wiki/Command_history"&gt;command history&lt;/a&gt; make issuing commands far more efficient.  The coolest keystroke I’ve now picked up the habit of in the Mac Leopard OS is Cmd-Tab, which takes you to the last active open application (and repeated Cmd-Tabs cycle through the open applications in order of their previous usage).  This is perfect for toggling between the command-line and a text-editor where one can keep track of what one is doing.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://commons.wikimedia.org/wiki/File:Toggle-clamp_manual_vertical_3D_animated.gif"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 191px; height: 200px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/StdvtJ2io4I/AAAAAAAAAXU/bLZ2puufeZI/s200/Toggle-clamp_manual_vertical_3D_animated.gif" alt="" id="BLOGGER_PHOTO_ID_5392901900328936322" border="0" /&gt;&lt;/a&gt;(2) &lt;span style="font-style: italic;"&gt;Toggle between the command-line and a text-editor constantly&lt;/span&gt;.  Rather than trying to write a whole script and then run it from the command-line, it was far easier and faster to simply try commands out, sending them to the standard output, and cobble together the script line-by-line, adding the working commands to a text document.  This has three useful effects:  (1) Bugs get worked out before they even go into a script, (2) It forces one to document one’s work, as in a lab notebook.  This also ended up being quite useful for my lab meeting this week, in which I decided to illustrate some stuff directly from the terminal. (3) It is forcing me to work “properly”, that is sticking with UNIX commands as much as possible.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://thisisindexed.com/"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 124px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/StdwSpQKIwI/AAAAAAAAAXc/J5BeI8GgSmM/s200/card2282.jpg" alt="" id="BLOGGER_PHOTO_ID_5392902544413041410" border="0" /&gt;&lt;/a&gt;(3) Learn how the computer is &lt;a href="http://en.wikipedia.org/wiki/Index_%28information_technology%29"&gt;indexing&lt;/a&gt; your data.  This point is probably the most important, but also the one that is taking me the most effort. I’ll illustrate with an example (which I’ll get into in more scientific detail later):&lt;br /&gt;&lt;br /&gt;The output of one of our scripts was a giant 3 column X 1.8 million row table.  I wanted to look at a subset of this huge table, in which the values in some of the cells exceeded some threshold.  At first I was doing this (in R) by writing fairly complicated loops, which would go through each line in the file, see if any cells fit my criteria, and then return a new file that only including those rows I was interested in.  When I’d run the loop, it would take several minutes for finish.  And writing the loop was somewhat cumbersome.&lt;br /&gt;&lt;br /&gt;But the extremely valuable thing I learned was that R already had all the data in RAM indexed in a very specific way.  Built-in functions (which are extremely fast) allowed me to access a subset of the data using a single simple line of code.  Not only did this work dramatically faster, but was much more intuitive to write down.  Furthermore, it made it possible for me to index the large dataset in several different ways and instantly call up whichever subset I wanted to plot or whatnot.  I ended up with a much leaner and straightforward way of analyzing the giant table and I didn’t need to make a bunch of intermediary files or keep track of as many variables.&lt;br /&gt;&lt;br /&gt;Next time, I’ll try and flesh out some of the details what I was doing...&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-8931273483630756753?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/8931273483630756753/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/10/computing-bootcamp.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/8931273483630756753'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/8931273483630756753'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/10/computing-bootcamp.html' title='Computing Bootcamp'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_7qRGxl6StM4/StduqIogfRI/AAAAAAAAAXE/0mSZORERjrM/s72-c/lagunitas.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-8378022268795499034</id><published>2009-10-01T15:04:00.001-07:00</published><updated>2009-10-01T15:12:31.166-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='USS'/><category scheme='http://www.blogger.com/atom/ns#' term='degenerate'/><title type='text'>Corrected Logos</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.answers.com/topic/information-theory-2"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 310px; height: 320px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SsUnfwpw3oI/AAAAAAAAAWs/NC3uNkQintc/s320/0198162246.information-theory.2.jpg" alt="" id="BLOGGER_PHOTO_ID_5387755955808165506" border="0" /&gt;&lt;/a&gt;Yesterday, I attempted to make some logos out of the degenerate USS simulated data that Rosie sent me.  Turns out, I was doing it wrong.  After checking out the &lt;a href="http://www.lecb.ncifcrf.gov/%7Etoms/paper/logopaper/"&gt;original logo paper&lt;/a&gt;, I was able to figure out how to make my own logos in Excel.  I wasn't supposed to plot the "information content" of each base; I was supposed to take the total information content (in bits) of each position (as determined by the equation in the last post) and then, for each base, multiply that amount by the observed frequency of that base to get the height of that element in the logo.  So, below I put the proper logos for the selected and unselected degenerate USS sets (background corrected):&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="fullpost"&gt;Here's the selected set:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/SsUo6MW1MAI/AAAAAAAAAW0/B5cMIJ-u5Rc/s1600-h/correctSELlogo.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 170px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/SsUo6MW1MAI/AAAAAAAAAW0/B5cMIJ-u5Rc/s320/correctSELlogo.png" alt="" id="BLOGGER_PHOTO_ID_5387757509433176066" border="0" /&gt;&lt;/a&gt;Here's the unselected set:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/SsUo92NtQbI/AAAAAAAAAW8/qXaex4Dko3E/s1600-h/correctUNlogo.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 165px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SsUo92NtQbI/AAAAAAAAAW8/qXaex4Dko3E/s320/correctUNlogo.png" alt="" id="BLOGGER_PHOTO_ID_5387757572208804274" border="0" /&gt;&lt;/a&gt;Woo!&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-8378022268795499034?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/8378022268795499034/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/10/corrected-logos.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/8378022268795499034'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/8378022268795499034'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/10/corrected-logos.html' title='Corrected Logos'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_7qRGxl6StM4/SsUnfwpw3oI/AAAAAAAAAWs/NC3uNkQintc/s72-c/0198162246.information-theory.2.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-573216967357907414</id><published>2009-09-30T17:59:00.000-07:00</published><updated>2009-09-30T19:14:06.577-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='USS'/><category scheme='http://www.blogger.com/atom/ns#' term='computers'/><category scheme='http://www.blogger.com/atom/ns#' term='degenerate'/><title type='text'>Struggling with the Background</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.wired.com/thisdayintech/2009/04/dayintech_0430/"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 182px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/SsP_KU23ykI/AAAAAAAAAVs/QNJEtXLdZFM/s200/shannon_f.jpg" alt="" id="BLOGGER_PHOTO_ID_5387430132127877698" border="0" /&gt;&lt;/a&gt;&lt;a href="http://nodnacontrol.blogspot.com/2009/09/more-simulated-uptake.html"&gt;Previously&lt;/a&gt;, I had shown some preliminary analysis of Rosie’s simulated uptake data of chromosomal DNA fragments.  Rosie also sent me simulated uptake data of degenerate USS sequences (using a 12% degeneracy per position in this USS consensus sequence :&lt;br /&gt;&lt;br /&gt;5’-AAAGTGCGGTCAAATTTCAGTCAATTTTT-3’).&lt;br /&gt;&lt;br /&gt;So what can I do with this dataset?  Well, first, since Rosie also provided me with the “scores” for each sequence, I could plot a histogram of the scores for the 100 selected and 100 unselected sequences, showing that the uptake algorithm seems to work pretty well.&lt;br /&gt;&lt;br /&gt;&lt;span class="fullpost"&gt;Here's the histogram:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/SsP_amUaspI/AAAAAAAAAV0/K23M0McdJE8/s1600-h/degenerateHIST.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 262px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/SsP_amUaspI/AAAAAAAAAV0/K23M0McdJE8/s320/degenerateHIST.png" alt="" id="BLOGGER_PHOTO_ID_5387430411693109906" border="0" /&gt;&lt;/a&gt;Notably, even the unselected sequences have rather high scores, when compared to the same analysis of genomic DNA fragments. This is unsurprising, since the sequences under selection in this degenerate USS simulation are all rather close to the consensus USS.&lt;br /&gt;&lt;br /&gt;Here’s the histogram from the genomic uptake simulation again (just to compare):&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/SsP_l8fmvsI/AAAAAAAAAV8/hpT407Ao86c/s1600-h/50kbHistogram+copy.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 192px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SsP_l8fmvsI/AAAAAAAAAV8/hpT407Ao86c/s320/50kbHistogram+copy.png" alt="" id="BLOGGER_PHOTO_ID_5387430606624177858" border="0" /&gt;&lt;/a&gt;(I think the reason for the difference in the “selected” distributions is due to a different level of stringency when Rosie produced the two simulated datasets.)&lt;br /&gt;&lt;br /&gt;That’s all fine and good, but now what?  With the genomic dataset, I could use the &lt;a href="http://microbes.ucsc.edu/cgi-bin/hgGateway?hgsid=315443&amp;amp;clade=bacteria-gammaproteobacteria&amp;amp;org=Haemophilus+influenzae+Rd+KW20&amp;amp;db=0"&gt;UCSC genome browser &lt;/a&gt;to plot the location of all the fragments I was sent, but this consists of two alignment blocks of sequence that look markedly similar.&lt;br /&gt;&lt;br /&gt;The obvious thing to do was to make Weblogos of the two different datasets…  The unselected set should have very little information in it, while the selected set should contain information.  In doing this, I discovered a rather important issue…  The on-line version of Weblogo does NOT, I repeat, does NOT account for the background distribution.&lt;br /&gt;&lt;br /&gt;This is a problem.  It means that whenever you make a Weblogo (on the webserver) from your alignment block, it is assuming that each base is equally likely to occur at a random position.  This is why the y-axis in all Weblogos plots always has a maximum of 2 bits when using DNA sequence.  Why is this a problem?  First of all, if one is using an AT-rich genome, as we are, then the information content of any G or C is underestimated and any A or T is overestimated.&lt;br /&gt;&lt;br /&gt;So how does Weblogo calculate the &lt;a href="http://en.wikipedia.org/wiki/Position-specific_scoring_matrix"&gt;information content&lt;/a&gt; of each position in an alignment block?  From the Weblogo paper (link found at &lt;a href="http://weblogo.berkeley.edu/"&gt;the Weblogo website&lt;/a&gt;):&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/SsP_v83HXxI/AAAAAAAAAWE/YgEnLWmFsP8/s1600-h/2bits.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 65px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SsP_v83HXxI/AAAAAAAAAWE/YgEnLWmFsP8/s320/2bits.png" alt="" id="BLOGGER_PHOTO_ID_5387430778521476882" border="0" /&gt;&lt;/a&gt;Rseq = the information at a particular position in the alignment&lt;br /&gt;Smax = the maximum possible entropy&lt;br /&gt;Sobs = the entropy of the observed distribution of symbols (bases)&lt;br /&gt;N = the number of distinct symbols (4 for DNA)&lt;br /&gt;pn = the observed frequency of symbol n&lt;br /&gt;&lt;br /&gt;The log2 is there to put everything in terms of bits.&lt;br /&gt;&lt;br /&gt;So for DNA (4 bases), the maximum entropy at a position is 2 bits.  Makes perfect sense: 2 bits a base.  However, this only makes sense if each base is equally probable for a randomly drawn sequence.  Now for purposes of gaining an intuition for different motifs, this isn’t really a big deal, although it does complicate comparing motifs between genomes.&lt;br /&gt;&lt;br /&gt;When this isn’t the case (probably much of the time), then different measures have been used, namely the “&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/10812473"&gt;relative entropy&lt;/a&gt;” of a position.  This is an odds ratio of the observed probability and the background probability.  Apparently, the off-line version of Weblogo can account for non-uniform base composition, but I haven’t tried installing it yet, nor any other software out there that handles variation in GC content.&lt;br /&gt;&lt;br /&gt;Why?  Because what we need for our degenerate sequences is a different background distribution at each position!  So, the first position in the core is 88% A, but the third position is 88% G!&lt;br /&gt;&lt;br /&gt;To illustrate the problem, here is a Weblogo of the selected set:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/SsQAK-m-PcI/AAAAAAAAAWM/7DLwUv-sXnw/s1600-h/selectedWeb.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 89px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SsQAK-m-PcI/AAAAAAAAAWM/7DLwUv-sXnw/s320/selectedWeb.png" alt="" id="BLOGGER_PHOTO_ID_5387431242847108546" border="0" /&gt;&lt;/a&gt;Here’s the unselected set:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/SsQARdgrZzI/AAAAAAAAAWU/zSWVCn3umZo/s1600-h/unselectedWeb.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 89px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SsQARdgrZzI/AAAAAAAAAWU/zSWVCn3umZo/s320/unselectedWeb.png" alt="" id="BLOGGER_PHOTO_ID_5387431354221422386" border="0" /&gt;&lt;/a&gt;Looking closely, it is clear that there are differences in the amount of “information” at each position.  So in the strong consensus positions of the USS, the selected set has higher “information” than the unselected set, while at weak consensus positions, that’s less true.&lt;br /&gt;&lt;br /&gt;But the scaling of each base here is completely wrong.  There isn’t nearly a bit of information at the first position in the unselected set.  We expected 88% A.  The fact that there are mostly As in the alignment block at the first position is NOT informative.  In fact, if all was well, we’d get zero bits at all the unselected positions!&lt;br /&gt;&lt;br /&gt;What to do?  I tried to make my own logo, using the known true background distribution at each position.  I won’t belabor the details too much at the moment, except to say that I had to figure out what a &lt;a href="http://en.wikipedia.org/wiki/Pseudocount"&gt;“pseudocount”&lt;/a&gt; was and how to incorporate it into the weight matrix, so as to not ever take the logarithm of zero.&lt;br /&gt;&lt;br /&gt;Here’s the selected set of 100 sequences:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/SsQAcvvx8MI/AAAAAAAAAWc/cDjkrRpkiiA/s1600-h/selectedBCK.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 216px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/SsQAcvvx8MI/AAAAAAAAAWc/cDjkrRpkiiA/s320/selectedBCK.png" alt="" id="BLOGGER_PHOTO_ID_5387431548095164610" border="0" /&gt;&lt;/a&gt;Here’s the unselected set of 100 sequences:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/SsQAjI71IpI/AAAAAAAAAWk/ltFMQFnSCyE/s1600-h/unselectedBCK.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 210px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SsQAjI71IpI/AAAAAAAAAWk/ltFMQFnSCyE/s320/unselectedBCK.png" alt="" id="BLOGGER_PHOTO_ID_5387431657935807122" border="0" /&gt;&lt;/a&gt;(Note that I somehow lost the first position when I did this, so the motif starts at the first position of the core.)&lt;br /&gt;&lt;br /&gt;This actually looks quite a bit better, or at least more sensible.  A few things worth noting:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;I think if we did thousands of unselected sequences, we’d pretty much get zero information from that alignment, which is what we would want, since that’s just the background distribution.&lt;/li&gt;&lt;li&gt;Some values are negative.  This is expected.   Since these are scaled to log-odds ratios, when the frequency of seeing a certain base is less than the expected background frequency, a negative number emerges.&lt;/li&gt;&lt;li&gt;The scale is extremely reduced.  Every position is worth less than 0.3 bits.  This is also expected. One description I’ve seen about how information content can be thought about is how &lt;a href="http://en.wikipedia.org/wiki/Self-information"&gt;“surprised”&lt;/a&gt; one should be when making an observation (there’s even a unit of measure called a “suprisal”!).  Since we are drawing from an extremely non-uniform distribution that actually favors the base that’s expected to be taken up by cells better, we are basically squashing our surprise way down.  That is, getting an A at the first position of the core is highly favored, but it’s the most likely base to get anyways, even in the absence of selection.&lt;/li&gt;&lt;li&gt;The unimportant bases in the USS have the most information content in the selected set.  At first this bothered me, but then I realized it was utterly expected for the same reason as above. For example, at position #18 above (sorry, it’s position 19 in the Weblogos), the selection algorithm doesn’t really care what base is there.  That means that the selected set will let mutations at that position (from A to something else) come through, which will be surprising, when compared to the background distribution!&lt;/li&gt;&lt;/ul&gt;(ADDED LATER:  Actually, this last point is wrong.  The reason for so much information at the weak positions is related to the matrix that was used to select the sequences, not from surprise.  I'll try and get a proper dataset later and redo this analysis.  To some extent, the positions will still have some information, as partially explained in my erroneous explanation above, but not nearly so much.)&lt;br /&gt;&lt;br /&gt;Whew!  I’ve gotta quit now.  There’s a lot more to think about here.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-573216967357907414?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/573216967357907414/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/09/struggling-with-background.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/573216967357907414'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/573216967357907414'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/09/struggling-with-background.html' title='Struggling with the Background'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_7qRGxl6StM4/SsP_KU23ykI/AAAAAAAAAVs/QNJEtXLdZFM/s72-c/shannon_f.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-2730139997334039000</id><published>2009-09-28T12:45:00.000-07:00</published><updated>2009-09-28T12:55:24.004-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='recombination'/><category scheme='http://www.blogger.com/atom/ns#' term='plans'/><category scheme='http://www.blogger.com/atom/ns#' term='lab'/><title type='text'>Mismatch repair versus Segregation</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.esrf.eu/UsersAndScience/Publications/Highlights/2000/life-sci/LS2.html"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 320px; height: 207px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/SsESlwIGnQI/AAAAAAAAAVM/gKdcEC9Etoc/s320/LSFig-2.gif" alt="" id="BLOGGER_PHOTO_ID_5386607069095173378" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Things have gone swimmingly with my strain construction plans, and indeed today I am extracting DNA that will presumably be sequenced.  To recap, I made a couple of clinical isolates (86-028NP and PittGG) resistant to novobiocin (NovR) by transforming them with a bit of left-over NovR allele of the former postdoc.  I then isolated the new strains’ DNA, and used these to transform the standard KW20 Rd strain.  By selecting for NovR, we can be certain that the clones I pick took up DNA and recombined it into their genomes.&lt;br /&gt;&lt;br /&gt;One technical issue arose, however, which required a little bit of thought:  Should I have streaked for single colonies?  &lt;span style="font-style: italic;"&gt;I.e.&lt;/span&gt; once I had my transformants, it might be a good idea to streak out individual colonies to make sure I purified them away from any background or broke apart any doublet colonies.  No big deal, but after talking it out with Rosie, we decided to skip it.  Why?  So that we might get lucky and distinguish recombination followed by mismatch repair versus recombination followed by segregation.  In the following figures, I illustrate what I mean by this…&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;In this first one, the donor DNA is shown in red, and the recipient chromsome is shown in two colors, blue and green, to distinguish the strands.  The lowercase letters indicate polymorphic sites in the donor genome.  Little &lt;span style="font-style: italic; font-weight: bold;"&gt;a&lt;/span&gt; is meant to be the selectable marker, in this case an allele of &lt;span style="font-style: italic;"&gt;gyrB&lt;/span&gt;:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/SsESv9NUm8I/AAAAAAAAAVU/QAvqMWFzhBQ/s1600-h/mmr1.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 192px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SsESv9NUm8I/AAAAAAAAAVU/QAvqMWFzhBQ/s320/mmr1.png" alt="" id="BLOGGER_PHOTO_ID_5386607244405414850" border="0" /&gt;&lt;/a&gt;Donor DNA is incubated with competent recipient cells, and recombination of single-stranded DNA leaves patches of heteroduplex in the genome, shown as small red patches on either the blue or green strands.&lt;br /&gt;&lt;br /&gt;After this, the cells have a chance to perform mismatch correction to fix any heteroduplex.  I select for cells that have little a by plating to novobiocin plates, so only cells that end up &lt;span style="font-style: italic; font-weight: bold;"&gt;a/a&lt;/span&gt; will survive an make colonies.  (I am not going to show any examples of restoration repair, in which donor alleles are repaired back into recipient alleles… this will be invisible in our analysis.)&lt;br /&gt;&lt;br /&gt;In the below example, I show the &lt;span style="font-style: italic; font-weight: bold;"&gt;A/a&lt;/span&gt; and &lt;span style="font-style: italic; font-weight: bold;"&gt;B/b&lt;/span&gt; heteroduplexes getting mismatch repaired into &lt;span style="font-style: italic; font-weight: bold;"&gt;a/a&lt;/span&gt; and &lt;span style="font-style: italic; font-weight: bold;"&gt;b/b&lt;/span&gt;, whereas &lt;span style="font-weight: bold; font-style: italic;"&gt;C/c&lt;/span&gt; and &lt;span style="font-weight: bold; font-style: italic;"&gt;D/d&lt;/span&gt; heteroduplexes remain unrepaired (they escape correction).  What will happen in such as case is the generation of a sectored colony, in which (in principle) half the cells would have one genotype and the other half a different genotype:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/SsESznVMajI/AAAAAAAAAVc/9Rn7VSnplAE/s1600-h/mmr2.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 252px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/SsESznVMajI/AAAAAAAAAVc/9Rn7VSnplAE/s320/mmr2.png" alt="" id="BLOGGER_PHOTO_ID_5386607307252329010" border="0" /&gt;&lt;/a&gt;In the above example, the original transformant segregates the &lt;span style="font-weight: bold; font-style: italic;"&gt;c&lt;/span&gt; and &lt;span style="font-style: italic; font-weight: bold;"&gt;d&lt;/span&gt; alleles into different cells, while &lt;span style="font-style: italic; font-weight: bold;"&gt;a&lt;/span&gt; and &lt;span style="font-weight: bold; font-style: italic;"&gt;b&lt;/span&gt; end up in all cells.  If the whole resulting colony is grown up and sequenced, the a and b alleles will be the only ones observed, while at the other two loci, there will be a mix of &lt;span style="font-weight: bold; font-style: italic;"&gt;C&lt;/span&gt; and &lt;span style="font-weight: bold; font-style: italic;"&gt;c&lt;/span&gt;, along with a mix of &lt;span style="font-weight: bold; font-style: italic;"&gt;D&lt;/span&gt; and &lt;span style="font-weight: bold; font-style: italic;"&gt;d&lt;/span&gt;.  We wouldn’t be able to tell “phase”, &lt;span style="font-style: italic;"&gt;i.e.&lt;/span&gt; whether  &lt;span style="font-weight: bold; font-style: italic;"&gt;c&lt;/span&gt; and &lt;span style="font-weight: bold; font-style: italic;"&gt;d&lt;/span&gt; were on the same or different chromosomes, unless we did streak for singles and the sequenced several clones.  But as a first pass, this could be a really interesting analysis.  It will also serve as excellent proof-of-principle for our more intense sequencing plans.&lt;br /&gt;&lt;br /&gt;There is a caveat, however, which means we need to get a little bit lucky to be able to distinguish these phenomena (mismatch repair versus segregation).  We won’t see two different genotypes, if the &lt;span style="font-weight: bold; font-style: italic;"&gt;A/a&lt;/span&gt; heteroduplex isn’t mismatch corrected:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/SsES3PWs5fI/AAAAAAAAAVk/GF_yDNYASVI/s1600-h/mmr3.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 252px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SsES3PWs5fI/AAAAAAAAAVk/GF_yDNYASVI/s320/mmr3.png" alt="" id="BLOGGER_PHOTO_ID_5386607369535677938" border="0" /&gt;&lt;/a&gt;The issue isn’t that segregation didn’t happen; the problem is that one of the segregants dies under selection for little &lt;span style="font-weight: bold; font-style: italic;"&gt;a&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;Thus, if we see a pure genotype, then either all mismatches were corrected, or our selectable marker didn’t mismatch correct.&lt;br /&gt;&lt;br /&gt;When I pre-screen my transformants to make sure they’re not spontaneous mutants, I might be able to pick a colony where I think segregation is occurring.  If I get the standard sequencing traces back and see mixed bases in the chromatograms that corresponde to donor and recipient alleles, I’ll pick that kind of clone for sequencing…&lt;br /&gt;&lt;br /&gt;One sort of sad note here, in terms of the more distant future, is that mismatch repair mutants, which should be quite useful for understanding transformation, will need to be transformed without selection if we hope to recover isolated segregants from individual transformants.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-2730139997334039000?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/2730139997334039000/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/09/mismatch-repair-versus-segregation.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/2730139997334039000'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/2730139997334039000'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/09/mismatch-repair-versus-segregation.html' title='Mismatch repair versus Segregation'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_7qRGxl6StM4/SsESlwIGnQI/AAAAAAAAAVM/gKdcEC9Etoc/s72-c/LSFig-2.gif' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-5701491689334646953</id><published>2009-09-25T17:33:00.001-07:00</published><updated>2009-09-25T17:40:57.543-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='USS'/><category scheme='http://www.blogger.com/atom/ns#' term='computers'/><category scheme='http://www.blogger.com/atom/ns#' term='periplasm'/><title type='text'>More simulated uptake</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://voiceactingalliance.com/board/showthread.php?t=32813"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 184px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/Sr1hfClpTaI/AAAAAAAAAUk/ADWov_be9QI/s200/va_integer.jpg" alt="" id="BLOGGER_PHOTO_ID_5385567915303587234" border="0" /&gt;&lt;/a&gt;Thanks to &lt;a href="http://rrresearch.blogspot.com/2009/09/why-would-we-have-used-intrand-rather.html"&gt;Rosie eliminating the &lt;span style="font-style: italic;"&gt;int&lt;/span&gt; function from her Perl model&lt;/a&gt;, I got to take a look at some more simulated uptake data.  &lt;a href="http://nodnacontrol.blogspot.com/2009/09/fake-periplasmic-data.html"&gt;Last time&lt;/a&gt;, there were several issues, which now seem solved.  This time to model uptake, she used the real genomic USS position weight matrix to stochastically select 500 bp fragments from the first 50 kb of the Haemophilus genome.  I got 200 from the forward strand and 200 from the reverse complement strand, along with a set of random fragments.  This is 4X spanning coverage of the 50 kb…&lt;br /&gt;&lt;br /&gt;Below is the way the data looked in the UCSC genome browser, added as custom tracks (click on the figures to enlarge).&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/Sr1hll_PfII/AAAAAAAAAUs/VwIiCDHO8Ok/s1600-h/50kbselected.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 91px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/Sr1hll_PfII/AAAAAAAAAUs/VwIiCDHO8Ok/s400/50kbselected.png" alt="" id="BLOGGER_PHOTO_ID_5385568027885403266" border="0" /&gt;&lt;/a&gt;From the top, the tracks are:&lt;br /&gt;(1) Chromosome position&lt;br /&gt;(2) 400 random fragments (shades of  brown).&lt;br /&gt;(3) 400 selected fragments (shades of blue).&lt;br /&gt;(4) Positions of “perfect core” USS motifs (5’AAGTGCGGT-3’) on either strand&lt;br /&gt;(5) RefSeq gene annotations.&lt;br /&gt;&lt;br /&gt;Here’s a bit of the 50 kb zoomed in:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/Sr1hppOA1GI/AAAAAAAAAU0/1F2dVurXwx8/s1600-h/50kbselectedZoom.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 90px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/Sr1hppOA1GI/AAAAAAAAAU0/1F2dVurXwx8/s400/50kbselectedZoom.png" alt="" id="BLOGGER_PHOTO_ID_5385568097472140386" border="0" /&gt;&lt;/a&gt;That looks pretty good for such low coverage!  (In our real experiment, we expect to get several hundred times more data.)  It’s starting to look like a real model of how uptake might look!  The random fragments look roughly randomly distributed, and the selected fragments clearly show a punctate distribution around the “perfect core” USS sites.&lt;br /&gt;&lt;br /&gt;Here’s a histogram of the scores of the best site on a given fragment for the random and selected datasets:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/Sr1hts7j0nI/AAAAAAAAAU8/77v7B0j6bto/s1600-h/50kbHistogram.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 192px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/Sr1hts7j0nI/AAAAAAAAAU8/77v7B0j6bto/s320/50kbHistogram.png" alt="" id="BLOGGER_PHOTO_ID_5385568167187960434" border="0" /&gt;&lt;/a&gt;Indeed, the distributions are quite distinct, though notably the distribution of random fragments looks bimodal.  This may simply be a feature of the genome, since there are so many USS sites…  Worth thinking about though.&lt;br /&gt;&lt;br /&gt;There are other details obscured in the browser figures: the shading indicates the relative score of the best site on the fragment (on a log scale), and each fragment also has orientation shown as a small arrowhead within the box.  I’ve also associated each fragment with its score.  So in the browser, I can easily check things out more carefully:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/Sr1hxoFHfrI/AAAAAAAAAVE/eJsxtvMYSS4/s1600-h/50kbselectedPack.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 124px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/Sr1hxoFHfrI/AAAAAAAAAVE/eJsxtvMYSS4/s400/50kbselectedPack.png" alt="" id="BLOGGER_PHOTO_ID_5385568234605346482" border="0" /&gt;&lt;/a&gt;In this zoom, it’s clear that there is an excellent site to the left (just under 18,000), a weaker site to its right (~18,400; fragments are overlapping with the left-most site; no perfect core), and a pair of sites on different strands with different scores to the right (~19250).  I can also retrieve the sequence associated with a given fragment to see if I can spot the USS site within it.  And if I really zoom in, the DNA sequence is listed at the top.&lt;br /&gt;&lt;br /&gt;It’s not really so easy to see what’s going on with all these overlapping fragments, so my next task will be to convert this data from fragment positions to spanning coverage per chromosomal position (though I’ll probably bin positions, perhaps every 100 bp to keep things reasonably small for now).  I will take a stab at doing this properly (with a script) but may wimp out and do it in Excel.  If I can then muscle this data into a &lt;a href="https://cgwb.nci.nih.gov/goldenPath/help/wiggle.html"&gt;WIG formatted file&lt;/a&gt;, then I’ll be able to plot the data in a way where “good” sites will look like peaks in coverage…&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-5701491689334646953?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/5701491689334646953/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/09/more-simulated-uptake.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/5701491689334646953'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/5701491689334646953'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/09/more-simulated-uptake.html' title='More simulated uptake'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_7qRGxl6StM4/Sr1hfClpTaI/AAAAAAAAAUk/ADWov_be9QI/s72-c/va_integer.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-9057023882830919503</id><published>2009-09-24T16:13:00.000-07:00</published><updated>2009-09-24T16:24:58.336-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='recombination'/><category scheme='http://www.blogger.com/atom/ns#' term='linkage'/><category scheme='http://www.blogger.com/atom/ns#' term='lab'/><title type='text'>E-Z Strain Construction</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://my.qoop.com/store/galleria/tag/mosaic/"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 285px; height: 214px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/Srv-kJAZPAI/AAAAAAAAAUc/RYzKMAi-Aqg/s320/mosaic-in-torcello-cathedral-qpps_295317775180969.MD.jpg,285.jpeg" alt="" id="BLOGGER_PHOTO_ID_5385177676297878530" border="0" /&gt;&lt;/a&gt;As preliminary data for our genome-wide recombination analysis (outlined in &lt;a href="http://rrresearch.blogspot.com/2009/09/preliminary-data-goals-for-nih-proposal.html"&gt;this post&lt;/a&gt; from Rosie), we want to sequence the whole genome of a single transformed clone in the next couple of months.  The idea is to transform our standard &lt;a href="ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Haemophilus_influenzae/"&gt;KW20 Rd strain &lt;/a&gt;with DNA from one of the other completely sequenced strains (probably &lt;a href="ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Haemophilus_influenzae_86_028NP"&gt;86-028NP&lt;/a&gt;, possibly &lt;a href="ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Haemophilus_influenzae_PittGG"&gt;PittGG&lt;/a&gt;), select a single transformed colony, and sequence its genome.&lt;br /&gt;&lt;br /&gt;This will provide us with all sorts of useful preliminary results:&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Show that we can indeed handle the type (and amount) of data we’ll be obtaining.&lt;/li&gt;&lt;li&gt;Estimate the total amount of donor DNA a single recipient recombines (and fixes) into its genome.&lt;/li&gt;&lt;li&gt;Estimate the length of recombination tracts (gene conversions) / the strength of “linkage”.&lt;/li&gt;&lt;li&gt;Estimate mosaicism of donor and recipient sequences (mismatch repair).&lt;/li&gt;&lt;li&gt;Estimate the transformation rates for different classes of single-nucleotide differences (for example, the number of A-&gt;T transformation events observed versus the total A-&gt;T differences between the strains) &lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;In particular, item (2) will be crucial for estimating the total amount of sequencing we would need to measure transformation rates per polymorphism across the genome.  Simple transformation assays with DNA from the multi-antibiotic resistant MAP7 strain suggest that possibly 20-50kb of DNA may be replaced in a single transformant, but this type of analysis is restricted to only a few different sites in the genome and is very roughly calculated.&lt;br /&gt;&lt;br /&gt;The analysis of a single transformed genome will still be preliminary with regards to (3)-(5), for which we will want genome sequences for several independent clones.  In the future we are likely to barcode and pool independent transformants, since we expect that a single lane of Illumina sequencing will be overkill for a single Haemophilus genome of less than 2 Mb (250X sequence coverage).&lt;br /&gt;&lt;br /&gt;Anyways, one issue with producing the material for this first sequencing experiment is that we need to make sure that the clone we select comes from a cell that was indeed competent and did indeed get transformed.  &lt;a href="http://nodnacontrol.blogspot.com/2009/05/congression-versus-linkage.html"&gt;Since only a fraction of cells in a competent culture are competent&lt;/a&gt;, we would be wasting a lot of time and money, if we accidentally just re-sequenced our recipient genome.&lt;br /&gt;&lt;br /&gt;In order for this to work, we need our donor strain to carry an antibiotic resistance marker.  By selecting for recipients that become resistant, we can be sure the clone we select took up DNA that got recombined into the genome.  (This may also create a bias for donor alleles near the selected site, due to “linkage”.)&lt;br /&gt;&lt;br /&gt;To this end, I am doing the following:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Made a couple strains (KW20, 86-028NP, and PittGG) resistant to novobiocin.  I just did this.  It worked like a charm thanks to the former postdoc having a well-organized lab notebook and a well-organized freezer box containing a tube with a NovR allele of gyrB already prepared for me.  This was also my first time doing overnight transformations.  I couldn’t believe how easy it was:  Add a frozen aliquot of cells and some DNA to some sBHI media, let the cells grow overnight, and plate them the next day.  There were plenty of resistant colonies this morning.&lt;/li&gt;&lt;li&gt;Prepare DNA from the newly produced 86-028NP NovR and PittGG NovR strains.  I’ll do this tomorrow from the overnight cultures I just inoculated.&lt;/li&gt;&lt;li&gt;Transform KW20 with this DNA.  I’ll use competent cells I already have tomorrow, after my DNA prep.&lt;/li&gt;&lt;li&gt;Saturday, assuming I have NovR transformants, I’ll pick and grow up some transformed colonies overnight.  &lt;/li&gt;&lt;li&gt;Sunday, I can prepare this DNA, and that’ll be what we can send for sequencing!&lt;/li&gt;&lt;/ol&gt;So if all goes well, we should have our material in a few days!  Then we wait.  Then the real work begins…&lt;br /&gt;&lt;br /&gt;(As a side note, 86-028NP indeed appears to already be resistant to another antibiotic, nalidixic acid.  I will check to see if this resistance is transformable when I have the 86-028NP NovR DNA in hand.)&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-9057023882830919503?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/9057023882830919503/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/09/e-z-strain-construction.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/9057023882830919503'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/9057023882830919503'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/09/e-z-strain-construction.html' title='E-Z Strain Construction'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_7qRGxl6StM4/Srv-kJAZPAI/AAAAAAAAAUc/RYzKMAi-Aqg/s72-c/mosaic-in-torcello-cathedral-qpps_295317775180969.MD.jpg,285.jpeg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-6795179377002317029</id><published>2009-09-23T11:48:00.000-07:00</published><updated>2009-09-23T13:46:56.771-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='periplasm'/><category scheme='http://www.blogger.com/atom/ns#' term='browser'/><title type='text'>Fake Periplasmic Data</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://images.google.ca/imgres?imgurl=http://www.slmetalworks.com/tree%2520bed.jpg&amp;amp;imgrefurl=http://www.stylehive.com/tag/bedroom/recent/grid/0&amp;amp;usg=__hfN0C6bDOXj7fCXb4K7EjRoozWo=&amp;amp;h=360&amp;amp;w=350&amp;amp;sz=74&amp;amp;hl=en&amp;amp;start=2&amp;amp;um=1&amp;amp;tbnid=W5z-BQXannE_MM:&amp;amp;tbnh=121&amp;amp;tbnw=118&amp;amp;prev=/images%3Fq%3Dbed%26hl%3Den%26client%3Dfirefox-a%26rls%3Dorg.mozilla:en-US:official%26sa%3DG%26um%3D1"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 194px; height: 200px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/SrpvgTZk3nI/AAAAAAAAAT8/UYejEeQKiOE/s200/tree+bed.jpg" alt="" id="BLOGGER_PHOTO_ID_5384738905228893810" border="0" /&gt;&lt;/a&gt;&lt;a href="http://microbes.ucsc.edu/"&gt;UCSC Microbial Genomes Database&lt;/a&gt; was a nice find for me, since they &lt;a href="http://microbes.ucsc.edu/cgi-bin/hgGateway?hgsid=308260&amp;amp;clade=bacteria-gammaproteobacteria&amp;amp;org=Haemophilus+influenzae+Rd+KW20&amp;amp;db=0"&gt;host the &lt;span style="font-style: italic;"&gt;Haemophilus influenzae&lt;/span&gt; KW20 genome&lt;/a&gt;.  It has pretty much made me forget about my own plans to make a custom browser for the moment.  Though that will change, as when we have our own data, we’ll absolutely need some off-line way to browse our datasets, since they will be so large…&lt;br /&gt;&lt;br /&gt;As a first fake experiment to explore how our periplasmic DNA pools might look, Rosie sent me two sets of 200 sequences.  One set was 200 randomly chosen 100mers from the first 10 kb of the &lt;span style="font-style: italic;"&gt;Haemophilus&lt;/span&gt; genome, and the other set were 200 sequences (100mers) stochastically selected for the presence of a USS using her Perl scripts.  All I had to do was turn her data into &lt;a href="http://genome.ucsc.edu/FAQ/FAQformat#format1"&gt;a BED formatted file&lt;/a&gt;, which only took a few minutes.  As usual, I made the BED file using Microsoft Office, rather than a more savvy command-line way, which would've probably used &lt;a href="http://en.wikipedia.org/wiki/Grep"&gt;Grep&lt;/a&gt; or something.&lt;br /&gt;&lt;br /&gt;Here’s what her data looks like plotted as a custom track (squished) in the UCSC genome browser:&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;RANDOM&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/SrpwAO-mMGI/AAAAAAAAAUM/91A02wX50B0/s1600-h/random1.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 39px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SrpwAO-mMGI/AAAAAAAAAUM/91A02wX50B0/s400/random1.png" alt="" id="BLOGGER_PHOTO_ID_5384739453797806178" border="0" /&gt;&lt;/a&gt;SELECTED&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/SrpwInwgvAI/AAAAAAAAAUU/a63svkpFQzo/s1600-h/selected1.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 202px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SrpwInwgvAI/AAAAAAAAAUU/a63svkpFQzo/s400/selected1.png" alt="" id="BLOGGER_PHOTO_ID_5384739597888568322" border="0" /&gt;&lt;/a&gt;It looks like it sort of worked!  There’s a prominent peak containing nearly half the sequences in the selected pool, while the random fragments look just like they ought to.&lt;br /&gt;&lt;br /&gt;One issue here is that we know there are two other perfect matches to the core USS motif in the first 10 kb, and these weren’t captured by the selection algorithm.  It’s slightly unclear why that is, but might have something to do with the USS position-weight matrix that was used.  (Actually, there are six USS in the interval, but we were only searching one strand this time...)&lt;br /&gt;&lt;br /&gt;A beginning!&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-6795179377002317029?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/6795179377002317029/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/09/fake-periplasmic-data.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/6795179377002317029'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/6795179377002317029'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/09/fake-periplasmic-data.html' title='Fake Periplasmic Data'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_7qRGxl6StM4/SrpvgTZk3nI/AAAAAAAAAT8/UYejEeQKiOE/s72-c/tree+bed.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-3743854389770757981</id><published>2009-09-17T23:06:00.000-07:00</published><updated>2009-09-18T09:48:45.185-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='USS'/><category scheme='http://www.blogger.com/atom/ns#' term='degenerate'/><category scheme='http://www.blogger.com/atom/ns#' term='plans'/><category scheme='http://www.blogger.com/atom/ns#' term='lab'/><title type='text'>The Last Straw</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.samstoybox.com/toys/LastStraw.html"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 320px; height: 185px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SrMlqFYRKCI/AAAAAAAAAT0/4a6pzDopCck/s320/LastStraw.jpg" alt="" id="BLOGGER_PHOTO_ID_5382687384566573090" border="0" /&gt;&lt;/a&gt;Yesterday, Rosie kindly ran &lt;a href="http://rrresearch.blogspot.com/2009/09/better-analysis-of-data.html"&gt;her Perl script&lt;/a&gt; over the USS construct I designed.  The final thing I was worried about was whether or not my design had any USS or USS-like sequences in it, other than the one it's supposed to have.  I'd checked the construct for any core USS motifs (5'-AAGTGCGGT-3'), but since we think that the motif is more complex than this, it was important to make sure that there were no extra sequences that got high scores using the USS position-weight matrix.&lt;br /&gt;Fortunately, the construct looks good, so I can go ahead and order the control oligos and have high expectations that they'll work...&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;Here's how every 32 base pair window over the 199mer looks when scored with the USS PWM:&lt;br /&gt;&lt;span style="text-decoration: underline;"&gt;&lt;/span&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/SrMldABclqI/AAAAAAAAATs/zrEuTfTSoog/s1600-h/fromRRussSCORES.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 177px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/SrMldABclqI/AAAAAAAAATs/zrEuTfTSoog/s320/fromRRussSCORES.png" alt="" id="BLOGGER_PHOTO_ID_5382687159790376610" border="0" /&gt;&lt;/a&gt;There's a single prominent high-scoring site right where it should be, and all of the surrounding area scores near background.  The USS in the construct has a score (~10^-8) more than 10 orders of magnitude better than the next best sites.  There's a slight increase for windows immediately adjacent to the USS, presumably because the AT-tracts in the USS are still contained in those windows.  The rest of the construct only has scores at background.&lt;br /&gt;&lt;br /&gt;Just to show that these other sites really do represent background levels of USS score, Rosie also ran a randomized version of the sequence:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/SrMlPMinnZI/AAAAAAAAATk/d5C5n1M2c9U/s1600-h/fromRRrandom.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 177px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/SrMlPMinnZI/AAAAAAAAATk/d5C5n1M2c9U/s320/fromRRrandom.png" alt="" id="BLOGGER_PHOTO_ID_5382686922632568210" border="0" /&gt;&lt;/a&gt;Nothing better than 10^-18.  Excellent.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-3743854389770757981?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/3743854389770757981/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/09/last-straw.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/3743854389770757981'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/3743854389770757981'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/09/last-straw.html' title='The Last Straw'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_7qRGxl6StM4/SrMlqFYRKCI/AAAAAAAAAT0/4a6pzDopCck/s72-c/LastStraw.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-425281265309116771</id><published>2009-09-17T15:40:00.000-07:00</published><updated>2009-09-17T15:47:42.092-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='USS'/><category scheme='http://www.blogger.com/atom/ns#' term='degenerate'/><category scheme='http://www.blogger.com/atom/ns#' term='periplasm'/><category scheme='http://www.blogger.com/atom/ns#' term='plans'/><title type='text'>Scale UP!</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.akersolutions.com/Internet/IndustriesAndServices/PharmaBiotech/IndustrySectorsServeds/LargeScaleFermentation.htm"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 126px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SrK7aKRjvII/AAAAAAAAATc/Ds3pwsBD7EM/s200/haarman.jpg" alt="" id="BLOGGER_PHOTO_ID_5382570562770156674" border="0" /&gt;&lt;/a&gt;How much periplasmic DNA can I hope to get using my current protocols, and how much DNA will I need?  As promised, here are some rough calculations regarding the oligo purchases that we want to make.&lt;br /&gt;&lt;br /&gt;I used a molecular weight calculator &lt;a href="http://www.changbioscience.com/genetics/mw.html"&gt;available on-line&lt;/a&gt; to determine the size of the dsDNA I described in the last post.&lt;br /&gt;I alternatively could’ve used Rosie’s Universal Constants (660 g / mol of base pair and 10^-18 g / single 1 kb DNA molecule) to make this calculation, but since I’m dealing with a known sequence, I might as well get an exact molecular weight.  (I also made a minor mistake in the last post, and the molecule I describe is actually only 199 bp).&lt;br /&gt;&lt;br /&gt;So, for our USS molecule, MW = 122,828.6 g / mol.  And the oligo synthesis service we’re planning on using will be at the 1 micromole scale.  That means if we took all of the two oligos, annealed, extended, and purified, we’d end up with 0.123 grams of input DNA pool!  That’s really a very large amount.&lt;br /&gt;&lt;br /&gt;My previous concerns about needing to do PCR to maintain the pool are unfounded.  This scale should be sufficient for hundreds (or even thousands) of experiments...&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;What follows are my preliminary assumptions about yields from the periplasmic DNA prep.  They are based on several different experiments, though I am erring on the side of being conservative with my estimates and are guides for future experiments only. In this post, I will address the issues of scale-up at the end; before that I’ll just refer to the approximate total culture volume and amount of DNA that I’d need to get a target amount of DNA, assuming all else works perfectly.&lt;br /&gt;&lt;br /&gt;So, I’ve now done several experiments using a PCR fragment bearing the consensus USS, called USS-1.  If I add 20 ng DNA / 1 ml competent cells, ~50% is taken up.  That is, in &lt;span style="font-style: italic;"&gt;rec-2&lt;/span&gt; cells, my theoretical yield of periplasmic DNA is 10 ng.  My actual yield is considerably lower; as evaluated by my radiolabeling experiments, I estimate I get ~25% of my theoretical maximum.&lt;br /&gt;&lt;br /&gt;This means,&lt;br /&gt;&lt;br /&gt;1 ml cells + 20 ng DNA → 2.5 ng recovered.&lt;br /&gt;20 ml cells + 400 ng DNA → 50 ng.&lt;br /&gt;40 ml cells + 800 ng DNA → 100 ng&lt;br /&gt;&lt;br /&gt;But this is only for the consensus sequence.  Our real experiments will be a mix of molecules, some of which will be efficiently taken up and others that won’t.  For a cursory estimate, we might assume that ~50% of fragments will be “good” USS and the other half will be “bad”.  This would further reduce the yield.&lt;br /&gt;&lt;br /&gt;That means, I am likely to need ~80 ml cultures and a starting input DNA amount of ~1600 ng, just to get back a mere 100 ng of DNA back!&lt;br /&gt;&lt;br /&gt;Most Illumina sequencing centers seem to want ~1 ug of DNA to make libraries, but a lot of ChIP-seq experiments seem to call for only ~100 ng.  In our case, there will be no downstream library construction, so we can likely get away with small amounts of DNA, as long as it is quite pure and accurately quantified.&lt;br /&gt;&lt;br /&gt;Regardless, this is going to take fairly large cultures, fairly large amounts of DNA, and a good scaled-up periplasmic prep.&lt;br /&gt;&lt;br /&gt;BUT, one important thing to note is that our degenerate oligo preparation will be more than sufficient for a large number of experiments, even at this large scale.  For the controls, I can merely buy minimum-scale synthesis long oligos at ~$200 a pop.  Since I can safely PCR amplify these, I will be able to make a replenishable stock for use in scale-up experiments.&lt;br /&gt;&lt;br /&gt;More on this in the future, but while I’m doing this, I might as well estimate what it will take to get a microgram of chromosomal DNA fragments out of competent cell periplasms.&lt;br /&gt;&lt;br /&gt;My previous experiments with sonicated DNA gave pretty consistent DNA uptake measurements:&lt;br /&gt;&lt;br /&gt;~50% of 200 ng 1-10kb DNA / 1 ml cells → 100 ng max. yield.&lt;br /&gt;~10% of 200 ng 0.2-0.4kb DNA / 1 ml cells → 20 ng max. yield.&lt;br /&gt;&lt;br /&gt;Given a 25% recovery rate from the periplasm, this means that for a microgram of DNA, I will need:&lt;br /&gt;&lt;br /&gt;1-10kb DNA:         8 micrograms in a 40 ml culture&lt;br /&gt;0.2-0.4kb DNA:    40 micrograms in a 200 ml culture (!)&lt;br /&gt;&lt;br /&gt;This last is really asking a lot.  That size of scale-up will require special thought…&lt;br /&gt;&lt;br /&gt;Appendix on Scale-up Issues:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Purity&lt;/span&gt;:  I have not been adding RNase.  I need to get all the RNA away, in order to accurately quantify the DNA.  I am also concerned about salt.  The CsCl in my DNA precipitates may not be getting washed out adequately by a single 80% ethanol wash.&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Cell&lt;/span&gt; &lt;span style="font-style: italic;"&gt;concentration&lt;/span&gt;:  It would help for technical reasons, if I could concentrate the cells quite a bit before doing the organic extractions.  I have used a ratio of 1:1, cells :  organic solvents.  So a 1 ml competent cell prep (~a billion cells) gets mixed with 1 ml solvent.  But I might be able to resuspend 10 ml of cells in 1 ml and then use 1 ml solvent.  I just don’t know.&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;DNA&lt;/span&gt; &lt;span style="font-style: italic;"&gt;concentration&lt;/span&gt;:  I want to make sure that I am saturating with DNA for my initial experiments, but I haven’t yet done a proper saturation curve to know what I should be using.  This will decrease the total efficiency of DNA uptake, but my total yields will be higher, and I will be biasing things towards the best uptake sequences (which is a good place to start).&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;DNase&lt;/span&gt;:  I have not been treating cells with DNase prior to isolation.  From what I can tell, this is not a problem, and the free DNA is washed away.  But if I use very high DNA concentrations, I will probably want to use DNase, just to be sure I’m eliminating free DNA completely.&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Details, details:&lt;/span&gt;  Scale-up is never quite as simple as just increasing the volume of everything.  I will need to make sure that there are appropriate centrifuges, shakers, tubes, and everything else.  Growth rates of cells and competence induction may be poor when going to larger volume cultures.  I am also concerned about scaling up the organic extractions.  It turns out that not all conicals are created equal; I’ve had disasters where the phenol has torn through the bottom of 50 ml conicals when doing large-scale organic extractions, depending on the brand of conical and rotor used.  I’ll need to make sure that things like this don’t happen in advance before I mess up somebody else’s equipment!&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-425281265309116771?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/425281265309116771/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/09/scale-up.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/425281265309116771'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/425281265309116771'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/09/scale-up.html' title='Scale UP!'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_7qRGxl6StM4/SrK7aKRjvII/AAAAAAAAATc/Ds3pwsBD7EM/s72-c/haarman.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-9171366685234925887</id><published>2009-09-17T10:45:00.000-07:00</published><updated>2009-09-17T11:15:01.261-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='USS'/><category scheme='http://www.blogger.com/atom/ns#' term='degenerate'/><category scheme='http://www.blogger.com/atom/ns#' term='lab'/><title type='text'>Reverse engineering</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.imagenes-bio.de/services/ings/specifications"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 220px; height: 194px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SrJ2tHUycjI/AAAAAAAAASc/uyCdaCDb9OQ/s320/illumina_FlowCell_optics_220_194.jpg" alt="" id="BLOGGER_PHOTO_ID_5382495022093595186" border="0" /&gt;&lt;/a&gt;Okay, so now that we’ve exposited all the brilliant experiments we’re planning to do while writing proposals, the actual reality of doing the experiments is starting to sink in.  We’ve also managed to put down some fairly concrete goals for the next several months.&lt;br /&gt;&lt;br /&gt;One of our experiments involves measuring the specificity of DNA uptake by naturally competent &lt;span style="font-style: italic;"&gt;H. influenzae&lt;/span&gt; for fragments containing “the genomic USS motif”.  The &lt;span style="font-style: italic;"&gt;H. influenzae&lt;/span&gt; genome contains an abundant sequence motif, and fragments bearing it are taken up better than fragments that don’t.  This “uptake signal sequence” was originally defined by its functional role in DNA uptake, but has since been characterized mostly by bioinformatics, with no direct uptake specificity data.  The limited data from previous lab members suggests only an imperfect correspondence between the properties of the genomic motif and the specificity of DNA uptake.&lt;br /&gt;&lt;br /&gt;The idea, then, is to feed competent cells small DNA fragments bearing a degenerate (highly mutated) version of the USS consensus sequence, recover those that are preferentially taken up, and sequence the resulting pool.  USSs are ~32 bases, well within the reach of single-end Illumina reads, if they are positioned properly next to a sequencing primer.&lt;br /&gt;&lt;br /&gt;I’ve &lt;a href="http://nodnacontrol.blogspot.com/2009/05/rosie-and-i-have-been-working-our-way.html"&gt;previously&lt;/a&gt; &lt;a href="http://nodnacontrol.blogspot.com/2009/05/degenerate-oligos-ii.html"&gt;discussed&lt;/a&gt; the expected properties of a degenerate USS pool.  And though I think we need to consider this more, I will focus this post on the design of other parts of the construct that will allow us to circumvent subsequent sequencing library construction steps.  Illumina sequencing uses specific sequences added to the ends of molecules to capture and sequence DNA of interest...&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;Properties needed for a USS-containing construct, where Illumina sequencing can be directly performed to sequence the USS:&lt;br /&gt;&lt;br /&gt;(1)    &lt;span style="font-style: italic;"&gt;SIZE&lt;/span&gt;: ≥200 base pairs, dsDNA.  200 base fragments with USS are efficiently taken up by cells, and the size is sufficient for efficient cluster synthesis and sequencing using Illumina’s Genetic Analyzer.&lt;br /&gt;(2)    &lt;span style="font-style: italic;"&gt;CAPTURE SEQUENCES&lt;/span&gt;:  One end of a strand of each fragment needs to be able to anneal to one of the two “Flow Cell Primers” (FP) in the Illumina flow cell, while the other end of the same molecule needs to contain the reverse complement of the other FP.&lt;br /&gt;(3)    &lt;span style="font-style: italic;"&gt;SEQUENCING PRIMER BINDING SITE&lt;/span&gt;:  The reverse complement of Illumina’s sequencing primer needs to be immediately downstream of the reverse complement of the USS.  (This could work the other way, but getting the “sense” USS directly from the sequencing reads seems optimal).&lt;br /&gt;(4)    &lt;span style="font-style: italic;"&gt;TAG&lt;/span&gt; &lt;span style="font-style: italic;"&gt;SEQUENCE&lt;/span&gt;:  The first few (four) bases of each read should be in non-degenerate fixed sequence to facilitate the alignment of the degenerate USS reads.&lt;br /&gt;(5)    &lt;span style="font-style: italic;"&gt;CONSTRUCTION&lt;/span&gt;:  After consulting several oligo makers, we learned that we wouldn’t be able to get our degenerate constructs built into an oligo longer than 130 nt.  This means that I will need to anneal two oligos together and extend with polymerase to generate a full-length construct.&lt;br /&gt;&lt;br /&gt;The first trick was to actually find out what the normal Illumina adapter and primer sequences were.  &lt;a href="http://seqanswers.com/forums/showthread.php?t=198"&gt;They were available on-line&lt;/a&gt;, and I think I’ve mostly reverse-engineered what the different bits do.  And think I have a reasonable design:&lt;br /&gt;&lt;br /&gt;I’ll order two oligos, one 130 nt and the other 106 nt.  (At the end of this post, I will list the exact sequences of each part and some notes.)  They’ll have 36 bp of reverse complementarity at their 3’-ends, so that I can anneal them and extend to produce full-length construct.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/SrJ3dOJ_wHI/AAAAAAAAASk/3AkMWO3gcNw/s1600-h/oligo1.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 181px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SrJ3dOJ_wHI/AAAAAAAAASk/3AkMWO3gcNw/s320/oligo1.png" alt="" id="BLOGGER_PHOTO_ID_5382495848561098866" border="0" /&gt;&lt;/a&gt;To illustrate what all the different parts of the construct are for, here’s a color-coded version, for which I’ll schematically diagram the Illumina cluster synthesis and sequence priming.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/SrJ30YHLkDI/AAAAAAAAASs/Ju63fy6YZCk/s1600-h/oligo2.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 88px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/SrJ30YHLkDI/AAAAAAAAASs/Ju63fy6YZCk/s320/oligo2.png" alt="" id="BLOGGER_PHOTO_ID_5382496246370635826" border="0" /&gt;&lt;/a&gt;The key features are that the flow cell primers (FP) are on opposite strands on opposite ends and the sequencing primer (SP) sits adjacent to the USS (with the 4-bp tag at the beginning).  I am using plasmid sequence present in the lab’s other USS constructs for the Gaps (1 and 2).&lt;br /&gt;&lt;br /&gt;To sequence the 200mer (either before or after recovery from competent cell periplasms), the DNA would be melted and annealed to an Illumina flow cell.  Below are shown two different parts of a flow cell surface, where the two different strands of a single molecule might anneal.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/SrJ4P4pXMxI/AAAAAAAAAS0/ufZEOho70eI/s1600-h/oligo3.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 186px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SrJ4P4pXMxI/AAAAAAAAAS0/ufZEOho70eI/s320/oligo3.png" alt="" id="BLOGGER_PHOTO_ID_5382496718960407314" border="0" /&gt;&lt;/a&gt;DNA synthesis from FP1 or FP2 generates a covalently attached version of each strand.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/SrJ5Fz9C93I/AAAAAAAAAS8/4cUGOsYOkEI/s1600-h/oligo4.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 139px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/SrJ5Fz9C93I/AAAAAAAAAS8/4cUGOsYOkEI/s320/oligo4.png" alt="" id="BLOGGER_PHOTO_ID_5382497645413726066" border="0" /&gt;&lt;/a&gt;The original molecule is melted off and washed out of the flow cell, and a special in situ PCR method generates clusters of single-strands covalently bound to the flow cell surface.  In each cluster, the strands are oriented in both directions.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/SrJ5QdbHv_I/AAAAAAAAATE/hV4N4zulGLw/s1600-h/oligo5.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 168px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/SrJ5QdbHv_I/AAAAAAAAATE/hV4N4zulGLw/s320/oligo5.png" alt="" id="BLOGGER_PHOTO_ID_5382497828344414194" border="0" /&gt;&lt;/a&gt;Sequencing then proceeds from SP binding sites.  In this design, the SP binding site will then read the complement of the USS (with the first four fixed bases), so the actual sequence generated would be the USS contained in the construct.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/SrJ5TfChvDI/AAAAAAAAATM/pwQ1I3gvDXg/s1600-h/oligo6.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 214px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SrJ5TfChvDI/AAAAAAAAATM/pwQ1I3gvDXg/s320/oligo6.png" alt="" id="BLOGGER_PHOTO_ID_5382497880317738034" border="0" /&gt;&lt;/a&gt;There are several small details to go over to make sure that this design will work.  Because the oligos are so expensive, and the degenerate oligo will be precious, I also plan to buy several non-degenerate oligos corresponding to perfect consensus, randomized, and mutant USSs.  These will act as controls for the annealing/extension step that generates the uptake substrates and as controls for measuring saturation curves to optimize the appropriate DNA uptake conditions.  I will also be able to do PCR to regenerate the control constructs, while I should probably avoid amplifying  the degenerate USS construct for fears of strongly biasing the representation of different sequences.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;NEXT UP:&lt;/span&gt;  Uh oh… What about yields?  Dimensional analysis…&lt;br /&gt;&lt;br /&gt;APPENDIX:&lt;br /&gt;&lt;br /&gt;The different parts of the two oligos:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/SrJ6w17mmPI/AAAAAAAAATU/R_UcRgB9m70/s1600-h/oligos7.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 165px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/SrJ6w17mmPI/AAAAAAAAATU/R_UcRgB9m70/s400/oligos7.png" alt="" id="BLOGGER_PHOTO_ID_5382499484190546162" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Notes on my reverse engineering:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;FP1 (25 nt): Composed of putative 20mer FP1 + first 5 bases of one adaptor (calling it A)&lt;/li&gt;&lt;li&gt;SP1 (33 nt): Sequencing primer for single-end Illumina runs.  Includes the 13 bases of the normal adaptor that normally results in a 13 bp inverted repeat palindrome on either side of adapted DNA fragments.&lt;/li&gt;&lt;li&gt;USS (36 nt):  Includes 4-base tag (ATGC) upstream of a 32-base genomic Gibbs consensus sequence with a set level of degeneracy at each position.&lt;/li&gt;&lt;li&gt;G1 (36 nt): Additional sequence from pGEM7f ,corresponding to the portion of the spacer region where the two oligos are intended to anneal.&lt;/li&gt;&lt;li&gt;G2 (46 nt): More sequence from pGEM7f, corresponding to the spacer region only on one of the two oligo.&lt;/li&gt;&lt;li&gt;FP2’ (23 nt): Composed of the complement to the 20mer FP2 + first 3 bases of the other adaptor (calling it B).&lt;/li&gt;&lt;li&gt;Total length after annealing and extension is 200 bases, where the USS is located from position 63 (after the spacer) to position 94.  In the flow cell, the use of SP1 as a sequencing primer should read the complement of the USS sequence, so the actual sequence obtained will correspond to USS (with the first four bases always ATGC).&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-9171366685234925887?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/9171366685234925887/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/09/reverse-engineering.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/9171366685234925887'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/9171366685234925887'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/09/reverse-engineering.html' title='Reverse engineering'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_7qRGxl6StM4/SrJ2tHUycjI/AAAAAAAAASc/uyCdaCDb9OQ/s72-c/illumina_FlowCell_optics_220_194.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-3367070539771576091</id><published>2009-09-09T15:59:00.000-07:00</published><updated>2009-09-09T16:15:11.475-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='periplasm'/><category scheme='http://www.blogger.com/atom/ns#' term='lab'/><title type='text'>Eating chromosomal DNA fragments</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.molecularstation.com/molecular-biology-images/502-dna-pictures/109-dna.html"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 240px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/Sqg2BwAsyiI/AAAAAAAAASU/7MBB2xMpnZE/s320/dna.jpg" alt="" id="BLOGGER_PHOTO_ID_5379609158589663778" border="0" /&gt;&lt;/a&gt;&lt;span style="font-style: italic;"&gt;Haemophilus influenzae&lt;/span&gt; cells will take up closely related DNA from the environment quite efficiently, when they are made naturally competent by resource limitation.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://nodnacontrol.blogspot.com/2009/08/uptake-and-transformation-with.html"&gt;Previously&lt;/a&gt;, I had done some experiments using sonicated chromosomal DNA of two different size distributions.  The take-away lesson was that, for a fixed DNA concentration, larger fragments were taken up better than smaller fragments.  This could be due to two non-exclusive reasons:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Larger fragments are more likely to contain an uptake signal sequence.&lt;/li&gt;&lt;li&gt;The uptake machinery is saturated when I used the smaller fragments, since there are more fragments per unit mass.&lt;/li&gt;&lt;/ol&gt;I am not certain of the best way to measure the relative contributions of these two factors to the observed disparity in uptake, though I’m pretty sure a saturation curve would be the way to start things off, that is measuring the amount of uptake over a wide range of DNA concentrations.&lt;br /&gt;&lt;br /&gt;But first, and more to a practical concern for our sequencing plans...&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;I repeated this experiment, but also prepared total DNA and periplasmic DNA (by the slick method of Kahn et al) to make sure that I could cleanly recover chromosomal DNA fragments trapped in the periplasm from bulk chromosomes, &lt;a href="http://nodnacontrol.blogspot.com/2009/07/standing-upon-shoulders-of-giants.html"&gt;as I previously showed for a small USS-containing PCR fragment&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Here are the results of that experiment (in which I provided ~0.5 billion competent cells with 200 ng of end-labeled DNA fragments of two different size distributions, either 1-10kb or 200-400 bp, for 30 minutes):&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/Sqg0OUH6s-I/AAAAAAAAASE/ZEbYC36sUqQ/s1600-h/chromUPTAKE.png"&gt;&lt;img src="http://2.bp.blogspot.com/_7qRGxl6StM4/Sqg0OUH6s-I/AAAAAAAAASE/ZEbYC36sUqQ/s400/chromUPTAKE.png" alt="" id="BLOGGER_PHOTO_ID_5379607175418786786" style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 389px;" border="0" /&gt;&lt;/a&gt;In (a), the % uptake is clearly better for the larger size distribution than the smaller size distribution.  In (b) and (c), I show that I can purify periplasmic chromosomal fragments away from the cell’s chromosome. In (b), the results using the larger fragments is shown, while in (c) the results using the smaller fragments are shown.  (I ran gels with two different agarose concentrations to optimize the separation for the two different input pools).&lt;br /&gt;&lt;br /&gt;One thing to note is that the size-distribution of DNA between the input and periplasmic preparation were effectively indistinguishable.  I looked at traces of these lanes in the Molecular Dynamics ImageQuant software, and they looked pretty much exactly the same.  This is a little bit confusing, given the two models discussed above and the fact that fewer small fragments were taken up compared with larger fragments.  I might have expected that there would be a bias towards the larger fragments in the periplasm compared to the input, but this was not the case.&lt;br /&gt;&lt;br /&gt;Another thing to note is that, unlike when I previously did this experiment with USS-containing PCR fragments, there is still evidence of periplasmic DNA in wild type after 30 minutes.  I don’t think this is due to poor washing of free DNA away from the cells, but rather reflects that there had been insufficient time to translocate all of the DNA in the periplasm into the cytosol.  There is also the possibility that some of the non-chromosomal DNA in the wild-type samples are indeed cytosolic, which I can’t tell without some way to distinguish ssDNA and dsDNA.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-3367070539771576091?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/3367070539771576091/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/09/eating-chromosomal-dna-fragments.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/3367070539771576091'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/3367070539771576091'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/09/eating-chromosomal-dna-fragments.html' title='Eating chromosomal DNA fragments'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_7qRGxl6StM4/Sqg2BwAsyiI/AAAAAAAAASU/7MBB2xMpnZE/s72-c/dna.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-7300234675284468967</id><published>2009-08-28T16:07:00.000-07:00</published><updated>2009-08-28T16:22:07.540-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='recombination'/><category scheme='http://www.blogger.com/atom/ns#' term='alignment'/><title type='text'>Homoplasy versus Recombination</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.blackwellpublishing.com/ridley/a-z/Homoplasies.asp"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 266px; height: 231px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/SphjgPLOj2I/AAAAAAAAARM/0IgGEJYnkZs/s400/analogies.jpg" alt="" id="BLOGGER_PHOTO_ID_5375155560747274082" border="0" /&gt;&lt;/a&gt;Making up for lost blogging time!&lt;br /&gt;&lt;br /&gt;Just to get started doing something with this alignment, I just grabbed a single 68,580 bp alignment block out of the XMFA file I generated (as described in the last post), which was itself a simple FastA formatted file.  So I could use whatever programs could handle a simple FastA alignment and not worry about converting around to different formats yet, or coping with contig and rearrangement boundaries.&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;Here’s a picture from MAUVE of the aligned block I used (KW20 Rd reference coordinates 401,890 to 467,839).  It’s the orange block shown below.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/SphjxESuqqI/AAAAAAAAARU/6Ot8tJQ6cNc/s1600-h/test1alignmentblockORANGE.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 225px; height: 320px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/SphjxESuqqI/AAAAAAAAARU/6Ot8tJQ6cNc/s320/test1alignmentblockORANGE.jpg" alt="" id="BLOGGER_PHOTO_ID_5375155849883724450" border="0" /&gt;&lt;/a&gt;Based on some recent reading, I tried a couple different programs designed to detect signals of recombination from multiple alignment files.  There is a serious caveat here, in that I do not understand exactly how these programs work, and since they were designed for use on sexual eukaryotes, it’s quite possible that the assumptions of the programs will not be accurate when applied to Haemophilus.  Nevertheless, I sallied forth, just to see if they’d at least work with my sequences.  I can come back and try to understand the meaning of the output more slowly...&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.stats.ox.ac.uk/%7Emcvean/LDhat/"&gt;&lt;span style="font-weight: bold;"&gt;LDhat&lt;/span&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;First, I tried LDhat, which is composed of &lt;a href="http://www.stats.ox.ac.uk/%7Emcvean/LDhat/instructions.html"&gt;a suite of programs&lt;/a&gt; aimed at detecting recombination in a population sample.  I didn’t have any difficulty downloading and compiling the program, which was nice.  I wish I could tell you more about it, but I never actually made it work, so there’s not much point yet.  I got the “Convert” program to make the requisite files from an only slightly modified FastA, and I eventually got the “Pairwise” program to work.  This program generates a “look-up table” which is required for the other programs to operate.&lt;br /&gt;&lt;br /&gt;At first, the program kept choking on my data and was unable to produce the requisite look-up table for my data.  So I fed it a pre-made file that can be found at LDhat’s website.  The look-up table I used was first converted to a 15 taxa file using their “LKgen” utility.  Then “Pairwise” worked and generated me a new look-up table that should theoretically have worked with the other programs.  I next used the “Interval” program, but I could never get past here.  It wouldn’t do the analysis, because the look-up table was somehow lacking. It was not “exhaustive”, whatever that means, and there was apparently a disparity between “npt” (which I assume stands for Nonparametric test) and the data.&lt;br /&gt;&lt;br /&gt;No clue.  Moving on.  (I’ll try to return to this some other time, since the program “Rhomap” looks like what I’m looking for).&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.math.auckland.ac.nz/%7Ebryant/software.html"&gt;&lt;span style="font-weight: bold;"&gt;PhiTest&lt;/span&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Next I turned to PhiTest, which was also easily downloaded and compiled.  This program uses a principle that seems pretty sensible.  It examines “incompatibilities” in phylogenetic signals.  If two lineages of bacteria diverge and never recombine, then adjacent polymorphisms will most likely be “compatible”, that is both polymorphic sites will have the same phylogenetic signal (i.e. they will support the same tree topology).  On the other hand, “incompatible” sites could have two possible histories, one in which there was recurrent mutation in different lineages and one in which there was recombination between lineages.  This is illustrated nicely in a figure from &lt;a href="http://www.genetics.org/cgi/content/full/172/4/2665"&gt;the paper reporting PhiTest&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;"Figure 1:&lt;br /&gt;The dual nature of incompatibility. Two possible histories for a pair of incompatible sites are shown: (a) two incompatible sites explained by a recombination event and (b) two incompatible sites explained by a convergent mutation. Mutations in the first site are indicated by open circles and mutations in the second site are indicated by solid circles. To explain the incompatibility between the pair of sites either a recombination event must be invoked or a homoplasy must have occurred in the history of one of the sites."&lt;br /&gt;&lt;br /&gt;Figure 1A:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/Sphkk3DUmAI/AAAAAAAAARc/sfqyVX-XcOI/s1600-h/2665fig1a.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 183px; height: 200px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/Sphkk3DUmAI/AAAAAAAAARc/sfqyVX-XcOI/s200/2665fig1a.jpg" alt="" id="BLOGGER_PHOTO_ID_5375156739682637826" border="0" /&gt;&lt;/a&gt;Figure 1b:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/SphkvL_3Z9I/AAAAAAAAARk/LG1YqinUZNk/s1600-h/2665fig1b.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 182px; height: 200px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/SphkvL_3Z9I/AAAAAAAAARk/LG1YqinUZNk/s200/2665fig1b.jpg" alt="" id="BLOGGER_PHOTO_ID_5375156917103978450" border="0" /&gt;&lt;/a&gt;So in (a) the black mutation only occurred once and was subsequently shared between the central lineages, whereas in (b) the black mutation occurred independently in the two lineages.&lt;br /&gt;&lt;br /&gt;They more rigorously define “compatibility” thusly: “Two sites i and j are compatible if and only if there is a genealogical history that can be inferred parsimoniously that does not involve any recurrent or convergent mutations (known as homoplasies as in Figure 1b). If the two sites are not compatible, they are termed incompatible. Under an infinite-sites model (KIMURA 1969) of sequence evolution, the possibility of a homoplasy does not exist, and so incompatibility for a pair of sites implies that at least one recombination event must have occurred, as in Figure 1a.”&lt;br /&gt;&lt;br /&gt;So the idea of recombination detection algorithms is to exploit the notion of incompatibility, but it is extremely important to note that there are really two possibilities for any incompatibility:  (a) It is a true “homoplasy”, i.e. a recurrent mutation; (b) it indicates recombination.  The basic idea of the NSS method (which is apparently what LDhat is using) is to look at neighboring pairs of sites and find regions where incompatibilities are clustered together, suggesting perhaps a hotspot.&lt;br /&gt;&lt;br /&gt;The other, possibly more eukaryotic, aspect is that sites that are more distant will tend to be less compatible, since they’d more likely have had crossovers between them.  Our provisional model for natural transformation would not necessarily have this feature, however, since we do not expect our recombination events to be crossover-like.&lt;br /&gt;&lt;br /&gt;The authors of PhiTest produced a different measure than the NSS model that looked at the “Pairwise Homoplasy Index” for each pair of SNPs in an alignment.  This gives a test statistic of the minimum number of homoplasies in the alignment, essentially based on a parsimony criterion.  They could then apply permutation tests to measure the significance of a given PHI for an interval.  Happily, the program will also output the significance under the NSS and MaxChi models.&lt;br /&gt;&lt;br /&gt;For my alignment, PhiTest gave the probability of no recombination as 0 for all three models.  So it looks like there’s quite a bit of homoplasy in my alignment, evidence of recombination.  (It’s worth noting that PhiTest and LDhat both explicitly ignore gaps in the alignment.)&lt;br /&gt;&lt;br /&gt;But that’s not good enough, what I really want is a plot of recombination rates across the alignment block. &lt;a href="http://nodnacontrol.blogspot.com/2009/08/measuring-recombination.html"&gt; The figure I made from this post&lt;/a&gt; was the output of that analysis.  It is only showing the ~9500 polymorphic sites and ignoring all the gaps and identical sequences.  It’s showing, for each pair of SNPs, the probability of recombination (assuming that all homoplasies are due to recombination and not recurrent mutation).&lt;br /&gt;&lt;br /&gt;It pretty much looks like there’s evidence of recombination almost everywhere!  There’s just a couple areas that look like they have extensive compatibility, so it may actually be a lot easier for us to detect places where there’s been little recombination than places where there’s been lots.&lt;br /&gt;&lt;br /&gt;I’ll need to go back to the raw data and figure out which set of SNPs are showing evidence of low recombination and see where those spots are in the alignment.  They could be horizontally transferred segments with few USS?&lt;br /&gt;&lt;br /&gt;I’m not sure what to make of this.  It complicates things.  To what extent can I trust these algorithms to find recombination and not recurrent mutation?  The divergence between my sequences isn’t that high (only ~3%), so recurrent mutation doesn’t seem likely to be a huge problem, but still...&lt;br /&gt;&lt;br /&gt;Anyways, I went ahead and used the other utility that comes with PhiTest, called Profile.  This does PhiTest on sliding windows, providing a P-value for each segment.  Here’s what it looked like, when I did a “Profile” of non-overlapping 500 bp chunks of my alignment:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/SphlYnQulLI/AAAAAAAAARs/kO4Uq2bEVA0/s1600-h/phiprofile.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 144px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SphlYnQulLI/AAAAAAAAARs/kO4Uq2bEVA0/s400/phiprofile.png" alt="" id="BLOGGER_PHOTO_ID_5375157628797097138" border="0" /&gt;&lt;/a&gt;The x-axis is the position in the alignment, and the y-axis is the p-value of the PhiTest for that particular 500 bp chunk.  The line indicates a P-value threshold of 0.05, so any segment below that line has evidence of recombination, based on PHI. Segments that have higher P-values may either have compatible SNPs, or for the very high P-values, be sites with very little SNPs.&lt;br /&gt;&lt;br /&gt;This seems promising, if we can trust that we’re measuring something meaningful.  I have a strange feeling that using homoplasy to measure recombination is going to be fraught with difficulties, but it’s pretty much the best principle that we can work with to detect recombination in these population-level samples.  Luckily, both PhiTest and the other methods are sophisticated enough that they are evaluating the homoplasy index for a given pair of SNPs relative to other surrounding  SNPs, so when incompatibilities are detected, they are standing out from other possible pairings.  This should help with the problem of true homoplasy, but then again, maybe it doesn't.  I'll need to talk to my friend Corbin about this stuff.&lt;br /&gt;&lt;br /&gt;Whew!  Okay, enough on this for the moment...&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-7300234675284468967?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/7300234675284468967/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/08/homoplasy-versus-recombination.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/7300234675284468967'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/7300234675284468967'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/08/homoplasy-versus-recombination.html' title='Homoplasy versus Recombination'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_7qRGxl6StM4/SphjgPLOj2I/AAAAAAAAARM/0IgGEJYnkZs/s72-c/analogies.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-8153697827781151279</id><published>2009-08-28T12:27:00.000-07:00</published><updated>2009-08-28T13:03:08.338-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='alignment'/><title type='text'>More alignments!</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://sitemaker.umich.edu/mc13/bacterial_meningitis_causative_organism"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 400px; height: 349px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/Spg2CoTHhWI/AAAAAAAAARE/Q70gGBwu_hM/s400/haemophilus_20influenzae.jpg" alt="" id="BLOGGER_PHOTO_ID_5375105574071928162" border="0" /&gt;&lt;/a&gt;I realized something disturbing a couple days ago, when I crashed my computer by running out of hard-drive space...  The &lt;span style="font-style: italic;"&gt;Haemophilus influenzae&lt;/span&gt; genome is actually replicating inside my computer!  Not just on my lab bench!  Several of the programs I've been using (namely the alignment programs) have been spitting out files containing the complete genome, and I think I must've had several hundred before I got rid of a bunch of intermediary files.  &lt;span style="font-style: italic;"&gt;In silico&lt;/span&gt; infection!  A scourge!&lt;br /&gt;&lt;br /&gt;Anyways, I guess that last entry deserves more explanation, but it'll take me more than a post to do it.  First off, the motivation for doing these kinds of analyses:&lt;br /&gt;&lt;br /&gt;There’s 4 complete &lt;span style="font-style: italic;"&gt;Haemophilus influenzae &lt;/span&gt;genomes, 11 with draft assemblies (in 4-50 contigs), and 17 for which sequencing is in progress.  On top of that, hundreds of strains have been sequenced at seven house-keeping genes (MLST studies).&lt;br /&gt;&lt;br /&gt;Within this “population genomic” data, we should be able to do some estimates of recombination rates between isolates, as well as some sophisticated analyses of genetic variation at USS motifs.  But how?  Here’s the basic idea:&lt;br /&gt;&lt;br /&gt;(1) Align the complete and draft genomes.&lt;br /&gt;(2) Use programs written to detect recombination signals in population genetic data.&lt;br /&gt;(3) Examine genetic variation in and around USS motifs (scored for information content across each genome) and see if they correlate with the determined recombination signals.&lt;br /&gt;&lt;br /&gt;There are several issues with this at each step, but I’ll take the first one first.  I’ll start with a post describing how I produced the multiple alignment file that I fed to the PhiTest program...&lt;br /&gt;&lt;br /&gt;The first trick was downloading the files...   &lt;span class="fullpost"&gt; &lt;br /&gt;I already had the four complete sequences, but hadn’t had the easiest time getting the contigs of the draft assemblies.  Do they really want me to download each contig by clicking one at a time?  My problem was solved when I found the whole-genome shotgun (WGS) FTP site at NCBI.  Of course, these files were named by their WGS code, so I simply hand-search for the correct four letter code, based on this list and downloaded the corresponding file.  If I’d been more clever, I would’ve done this from the command line with a script containing the list of WGS data I wanted.&lt;br /&gt;&lt;br /&gt;I used the search term "txid727[orgn]" in the&lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=Genome&amp;amp;itool=toolbar"&gt; Genome Database&lt;/a&gt; or the &lt;a href="http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/"&gt;Taxonomy Browser&lt;/a&gt; to call up &lt;span style="font-style: italic;"&gt;Haemphilus influenzae&lt;/span&gt; genomes.  Using the Genome database yielded some spare entries that were plasmid sequences, and using Taxonomy browser yielded many isolates where the sequencing is still in progress.  Anyways, I could get to a Genome page for each sequenced isolate, but when I clocked to get the contigs, it provided separate accessions for each contig.  &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&amp;amp;Cmd=Retrieve&amp;amp;dopt=Contig+Table&amp;amp;list_uids=6485"&gt;Boo&lt;/a&gt;!   There had to be a better way, which took some digging, but was &lt;a href="ftp://ftp.ncbi.nih.gov/genbank/wgs/"&gt;here&lt;/a&gt;.  This is all the WGS data at NCBI.  Whew!  Luckily, I found the requisite four-letter codes from my other searches and downloaded the appropriate *.gbff.gz files and expanded them with gunzip.&lt;br /&gt;&lt;br /&gt;Anyway, this gave me a bunch of GenBank files with all the contigs in each one.  The contigs themselves were simply ordered by size, with the largest first and the smallest last, and the coordinate system corresponded to this ordering.  While this is fine, it would probably help me looking at alignments, if I ordered the contigs based on the reference KW20 Rd genome.  It would be a meaningless gesture in some ways, but would certainly facilitate looking at pictures.&lt;br /&gt;&lt;br /&gt;I did this by using &lt;a href="http://gel.ahabs.wisc.edu/mauve/"&gt;MAUVE&lt;/a&gt;’s &lt;a href="http://asap.ahabs.wisc.edu/mauve-aligner/mauve-user-guide/reording-contigs-in-draft-genomes.html"&gt;MoveContig&lt;/a&gt; function.  I reordered each set of WGS contigs pairwise against the reference KW20 Rd.  (Again, I should've been using the command-line, but found it easier to just click my way through... shameful.)  Here’s an example of what MoveContigs did comparing PittII to Kw20 Rd:&lt;br /&gt;&lt;br /&gt;Alignment #1: Keeps the original order of the contigs (from largest to smallest).&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/Spgv2Po7vrI/AAAAAAAAAQU/7Jh96_2_bVI/s1600-h/align1lcb.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 168px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/Spgv2Po7vrI/AAAAAAAAAQU/7Jh96_2_bVI/s320/align1lcb.jpg" alt="" id="BLOGGER_PHOTO_ID_5375098764224347826" border="0" /&gt;&lt;/a&gt;KW20 Rd is on top; PittII is on bottom.  The colored blocks indicate contiguous aligned blocks.  The red lines indicate contig boundaries (KW20 has only two, for the start and stop of the chromosome).  The diagonal line between the two genomes show which blocks belong together.  It’s quite obvious from all the crosses that the order of the contigs is off.&lt;br /&gt;&lt;br /&gt;The program then reordered the contigs and did another alignment.  Then it did this a third time (it keeps iterating until it’s done).  It decides this is as good as it can do and stops.&lt;br /&gt;&lt;br /&gt;Alignment #3: Reordered contigs maximizing “synteny” between the reference and PittII.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/Spgv_nP9gfI/AAAAAAAAAQc/fE4JXjPjuzA/s1600-h/align3lcb.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 168px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/Spgv_nP9gfI/AAAAAAAAAQc/fE4JXjPjuzA/s320/align3lcb.jpg" alt="" id="BLOGGER_PHOTO_ID_5375098925180879346" border="0" /&gt;&lt;/a&gt;Clearly reordering the contigs has made the picture clearer to see.  There are far fewer alignment blocks and fewer diagonally crossing central connector lines.  Of course, some of the contigs did span some breakpoints, so there’s obviously rearrangements between the genomes.  At each contig boundary, there is always the possibility of some rearrangement between the strains, so this contig order is at best provisional.  Another thing I had to keep in mind while tooling around with these alignment programs is that the genome is circular, but the programs don’t treat the strings that way, so there’s often things at the right or left ends that obviously are still syntenic, but just don’t look like it with these linear visualizations.&lt;br /&gt;&lt;br /&gt;Here’s the alignments again, without the colored blocks, just so that the reordering of the contigs is quite obvious:&lt;br /&gt;&lt;br /&gt;Alignment #1:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/SpgwKdM04hI/AAAAAAAAAQk/T7NR_Q-oWUU/s1600-h/align1plain.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 168px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SpgwKdM04hI/AAAAAAAAAQk/T7NR_Q-oWUU/s320/align1plain.jpg" alt="" id="BLOGGER_PHOTO_ID_5375099111461937682" border="0" /&gt;&lt;/a&gt;Alignment #3:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/SpgwRwfkHyI/AAAAAAAAAQs/HnzSpYdd5Cg/s1600-h/align3plain.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 168px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/SpgwRwfkHyI/AAAAAAAAAQs/HnzSpYdd5Cg/s320/align3plain.jpg" alt="" id="BLOGGER_PHOTO_ID_5375099236899888930" border="0" /&gt;&lt;/a&gt;Notably, the little tiny contigs that failed to find something in the reference to align with remained at the right end.&lt;br /&gt;&lt;br /&gt;Alright, so going through 11 of these MoveContig steps provided me with 11 new files that had the contigs in a different order (based on KW20 Rd) than what I got from GenBank.  The only unfortunate thing about this was that MoveContigs only outputted FastA, instead of GenBank, so I lost all the annotations.  Oh well.  I would still be able to use the KW20 Rd annotations.  I think that the annotations can still be added back, but I haven’t figured that out yet.&lt;br /&gt;&lt;br /&gt;To show that this exercise wasn’t a waste, below I show the stripped-down alignments when I did the big 15-way multiple alignments (I hid the 3 other completely sequenced strains from view, and only the top several contig reorderings are visible.&lt;br /&gt;&lt;br /&gt;Before Contig Reordering:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/SpgwbLY3MUI/AAAAAAAAAQ0/cIQZkqEfwW0/s1600-h/4AlignAll.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 192px; height: 320px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/SpgwbLY3MUI/AAAAAAAAAQ0/cIQZkqEfwW0/s320/4AlignAll.jpg" alt="" id="BLOGGER_PHOTO_ID_5375099398738358594" border="0" /&gt;&lt;/a&gt;After Contig Reordering:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/SpgwiV_cM3I/AAAAAAAAAQ8/Cn0YeCVPt_U/s1600-h/5AlignAll.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 195px; height: 320px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SpgwiV_cM3I/AAAAAAAAAQ8/Cn0YeCVPt_U/s320/5AlignAll.jpg" alt="" id="BLOGGER_PHOTO_ID_5375099521843606386" border="0" /&gt;&lt;/a&gt;Much easier to look at... But still sort of meaningless.&lt;br /&gt;&lt;br /&gt;Alright!  So that’s a 15-way multiple genome alignment.  And indeed, it’s associated with a gigantic alignment file in the XMFA format: an extension of the FastA format that allows for a description of alignments that have rearrangements / contig breakponts / etc between them.&lt;br /&gt;&lt;br /&gt;The most difficult challenges:&lt;br /&gt;(1) Handling the breakpoints, contig boundaries and real rearrangement breakpoints alike.&lt;br /&gt;(2) Handling indels.  Those within an aligned block should be scorable, but how?&lt;br /&gt;&lt;br /&gt;So really, the first thing I can work with without much difficulty are contiguous aligned blocks.  I grabbed one of these from the XMFA file to produce a single FastA file, which I called test1.fasta, which I planned to use for some preliminary analyses...&lt;br /&gt;&lt;br /&gt;TO BE CONTINUED&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-8153697827781151279?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/8153697827781151279/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/08/more-alignments.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/8153697827781151279'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/8153697827781151279'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/08/more-alignments.html' title='More alignments!'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_7qRGxl6StM4/Spg2CoTHhWI/AAAAAAAAARE/Q70gGBwu_hM/s72-c/haemophilus_20influenzae.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-1521913711295730746</id><published>2009-08-26T22:18:00.000-07:00</published><updated>2009-08-28T13:13:43.234-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='alignment'/><title type='text'>Measuring recombination?</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/SpYYuxyo38I/AAAAAAAAAQM/HFkRq1IO8g4/s1600-h/matrix.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 400px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SpYYuxyo38I/AAAAAAAAAQM/HFkRq1IO8g4/s400/matrix.png" alt="" id="BLOGGER_PHOTO_ID_5374510397232046018" border="0" /&gt;&lt;/a&gt;Whew!  I've been slacking on the blog posts!  Above: Probability of recombination between pairwise SNPs in a 15 strain alignment block of ~65 kb.  Yellow indicates low probability of recombination... &lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;The figure above was generated by &lt;a href="http://www.math.auckland.ac.nz/~bryant/software.html"&gt;PhiTest&lt;/a&gt;, a program that tries to identify regions of recombination between different taxa in a multiple alignment.  Last night, I used &lt;a href="http://gel.ahabs.wisc.edu/mauve/"&gt;MAUVE&lt;/a&gt; to generate a 15-way multiple alignment between the genome sequences of 15 different Haemophilus infleunzae isolates.  (This took a while, but still was impressively quick.)&lt;br /&gt;&lt;br /&gt;I grabbed a reasonably large alignment block (65,480 bp, including gaps) and stuck it into Phitest.  It did everything reasonably quickly and gave me the unshocking result that there was a p = 0E+00 chance of no recombination occurring on the interval.  So I decided to use the -g switch to provide a graphical output giving the chance of recombination between every pair-wise combination of SNPs (of which there were 9422; the program ignored indels).&lt;br /&gt;&lt;br /&gt;This was fun, because it wrote such a gigantic file that my computer died due to lack of hard-drive space!  That doesn't happen often!  But it was mainly because I have several dozen gigabytes of stuff on my laptop that doesn't belong.  I nudged a fraction of this aside and repeated the analysis to give the plot above. &lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-1521913711295730746?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/1521913711295730746/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/08/measuring-recombination.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/1521913711295730746'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/1521913711295730746'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/08/measuring-recombination.html' title='Measuring recombination?'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_7qRGxl6StM4/SpYYuxyo38I/AAAAAAAAAQM/HFkRq1IO8g4/s72-c/matrix.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-6907037121807511923</id><published>2009-08-13T12:15:00.000-07:00</published><updated>2009-08-13T13:10:13.331-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='lab'/><title type='text'>Uptake and Transformation with "Biorupted" samples</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.jpk.com/dna-fragments.117.en.html"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 198px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SoRpnMfsFPI/AAAAAAAAAQE/kTOIeUz05J4/s200/dna-fragments.thumb.e7c8b0723030fdaf150251a461a0781bv2_max_300x297.png" alt="" id="BLOGGER_PHOTO_ID_5369532777822426354" border="0" /&gt;&lt;/a&gt;To get an idea of how uptake and transformation  would work with different sized donor chromosome fragments, I took MAP7 DNA and sheared it in a “&lt;a href="http://www.diagenode.com/pages/bioruptor200.html"&gt;bioruptor&lt;/a&gt;”.  I ended up with several samples with different size distributions, three of which I used for a pair of experiments:&lt;br /&gt;&lt;br /&gt;LARGE: &gt;40 kb (unsonicated)&lt;br /&gt;MEDIUM: 1-10kb (1 X 10 min sonication)&lt;br /&gt;SMALL: 100-400 bp (5 X 10 min sonication)&lt;br /&gt;&lt;br /&gt;My naïve assumption was that % uptake would go down as the fragment size decreased, since fewer fragments would contain “uptake signal sequences” (USS), which have an average density in the genome of ~1kb.&lt;br /&gt;&lt;br /&gt;I also thought that transformation rates would also go down for smaller fragments, but would not necessarily correlate that well with uptake, since additional steps of translocation and recombination could also potentially influence the efficiency of transformation.  So for example, transformation might drop off more quickly than uptake, if degradation would affect smaller fragments more than larger fragments.  (This seemed to be the case in &lt;a href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&amp;amp;pubmedid=2987941"&gt;Pifer and Smith, 1985&lt;/a&gt;.)&lt;br /&gt;&lt;br /&gt;Keeping in mind that these are just one-off experiments and need to be repeated (like pretty much every experiment I’ve reported in this blog), the above predictions look like they’re true, but I’m not certain if my reasons are necessarily correct...&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;First, I’ll show the uptake data.  I end-labeled the MEDIUM and SMALL donor fragments using Klenow and did a simple uptake experiment (comparing total radiolabel to that in cell pellets after 30 min of uptake) using wild type cells and as donors, either MEDIUM or SMALL chromosome fragments, and either saturating (500 ng) or sub-saturating (100 ng) amounts of input DNA per 0.5 ml of wild-type competent cells.  Here’s the data:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/SoRnuwtG7RI/AAAAAAAAAPk/8QZbHoMpSUo/s1600-h/uptakeBiorupt.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 274px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SoRnuwtG7RI/AAAAAAAAAPk/8QZbHoMpSUo/s320/uptakeBiorupt.png" alt="" id="BLOGGER_PHOTO_ID_5369530708778216722" border="0" /&gt;&lt;/a&gt;So clearly, several-fold less chromosomal DNA is taken up when smaller fragments (100-400 bp) are used than when larger fragments (1kb-10kb) are used.  As explained above, one reasonable explanation for this is that fewer fragments in the SMALL sample contain USS, so maybe only 1/10 to 1/2 will have a USS, whereas in the MEDIUM sample most fragments will contain at least 1 USS.&lt;br /&gt;&lt;br /&gt;But there is an alternative explanation that I’d like to be able to distinguish (but am pretty sure I can’t with this one experiment).  It could be that a given competent cell only takes up a fixed number of DNA fragments, independent of fragment size.  So since the SMALL sample is composed of ~10-100X more fragments per unit mass, it could be that I’ve simply saturated the system with fragments in the case of SMALL, but not in the case of MEDIUM.  This issue was addressed by &lt;a href="http://www.springerlink.com/content/m1mq7123685571w7/"&gt;Deich and Smith, 1980&lt;/a&gt;, and they concluded that indeed this was the case (that the number of molecules taken up was independent of fragment size), but while they do mention USS, they do not bring up USS density as a potential reason for their data.&lt;br /&gt;&lt;br /&gt;I’d hoped that by doing a second DNA concentration (100 ng) that I might get hints as to which of the two above models is correct (or if they are both correct and both contribute to the observation), but I don’t really think I can say too much without repeating this several times and getting some error bars on that graph.  Furthermore, I’m not really entirely sure what the expectations are for the two models.  I’ll have to think on this some more....&lt;br /&gt;-----&lt;br /&gt;Okay, what about transformation rates using my “biorupted” fragments?  Below are two graphs reporting the transformation rates of the KanR and NovR alleles from MAP7 to KW20 for the three different DNA pools (LARGE, MEDIUM, and SMALL):&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/SoRoAtRrxyI/AAAAAAAAAP0/yAPI__v-_pA/s1600-h/kanRUPT.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 238px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SoRoAtRrxyI/AAAAAAAAAP0/yAPI__v-_pA/s320/kanRUPT.png" alt="" id="BLOGGER_PHOTO_ID_5369531017095530274" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/SoRoFLjfQCI/AAAAAAAAAP8/ZTF53c7Hg6Q/s1600-h/novRUPT.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 239px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/SoRoFLjfQCI/AAAAAAAAAP8/ZTF53c7Hg6Q/s320/novRUPT.png" alt="" id="BLOGGER_PHOTO_ID_5369531093942747170" border="0" /&gt;&lt;/a&gt;(Note:  In the case of the NovR/CFU SMALL sample, the number reported is actually the limit of detection, so NovR/CFU(small) is less than 4.4e-6.  I didn't observe any NovR transformants for the SMALL fragments, despite having a decent limit of detection.)&lt;br /&gt;&lt;br /&gt;First, I'll look at the difference between the LARGE and MEDIUM fragments.  Both markers showed ~6-fold decrease in transformation rate in the briefly sonicated sample, compared to the large intact fragments.  Possible explanations:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Less USS per fragment:  &lt;/span&gt;I doubt this is a major reason for the difference.  While a larger percentage of fragments are expected to contain no USS in the MEDIUM sample, it shouldn’t be that large of a difference, since the mean density of USS motifs is ~1kb.&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Degradation by translocation or cytosolic nucleases&lt;/span&gt;:  This seems to be a reasonable explanation.  From an old set of experiments using a defined plasmid donor, Pifer and Smith, 1985 estimated that an average of ~1.5 kb of a leading 3’ end is degraded during translocation.  Maybe the medium-sized fragments simply don’t survive translocation as well as large taken up fragments.&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Recombination efficiency:&lt;/span&gt;  Maybe both the LARGE and MEDIUM fragments make it into the cytosol, but homology search and recombination are much better for larger fragments.  &lt;/li&gt;&lt;/ol&gt;Things look a little more interesting when looking at the change between MEDIUM and SMALL fragments:   While the KanR rate only changed modestly (less than 2-fold), the NovR rate went down below my limit of detection.  Possible explanations:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Distance to USS:&lt;/span&gt; The nearest USS to the NovR allele is more than 4oo bp away, but less than 400 bp for the KanR allele:  I like this explanation.  I really really need to figure out the identity of these antibiotic resistance alleles.  We’re pretty sure it’s a mutation in the &lt;span style="font-style: italic;"&gt;gyrB&lt;/span&gt; gene, but I don’t know the actual change.  I looked at &lt;span style="font-style: italic;"&gt;gyrB&lt;/span&gt; and it does contain a USS core motif and two other core motifs a few hundred bases before the start codon, but the gene is ~2.5 kb, so the actual &lt;span style="font-style: italic;"&gt;gyrB&lt;/span&gt; mutation could easily be too far away from these USS.  When I looked at the putative gene responsible for KanR (the ribosomal S7 gene), there was a single USS near the start, but none within.   Again, without knowing the causative lesion, I can’t tell whether this is within the size distribution of the SMALL fragments.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Differences in degradation rates at the two loci:&lt;/span&gt; This is possible. The other thing these data suggest is that, despite the ~1.5 kb average degradation reported by Pifer and Smith, 1985, there’s still plenty of small fragments that can recombine, since the KanR rates between MEDIUM and SMALL are not really dramatically different.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Recombination signals:&lt;/span&gt; Also possible. And probably the hardest to tell, since the other effects need to be canceled out.  &lt;/li&gt;&lt;/ol&gt;The main take-away (assuming that the results replicate) is that not only do different markers transform at different rates, the change in transformation rate for different sized-fragments also varies for different markers.  The underlying reasons for this are probably interesting.  So what next?  I need to repeat this, but next time I’d like to: &lt;ol&gt;&lt;li&gt;Extend this to additional markers to see if this variability also applies to other loci.&lt;/li&gt;&lt;li&gt;Measure linkage between Kan and Nov.  I’ve previously seen the known linkage between Kan and Nov using large fragments, but would expect linkage to vanish for small fragments when the KanR and NovR alleles never share the same fragment.&lt;/li&gt;&lt;/ol&gt;And again, I really need to know what the lesions are that are responsible for the MAP7 antibiotic resistances.  I’ve looked around a fair amount, but it seems that many antibiotic resistances can be produced by mutations in more than one different gene, so narrowing it down isn’t that straightforward.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-6907037121807511923?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/6907037121807511923/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/08/uptake-and-transformation-with.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/6907037121807511923'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/6907037121807511923'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/08/uptake-and-transformation-with.html' title='Uptake and Transformation with &quot;Biorupted&quot; samples'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_7qRGxl6StM4/SoRpnMfsFPI/AAAAAAAAAQE/kTOIeUz05J4/s72-c/dna-fragments.thumb.e7c8b0723030fdaf150251a461a0781bv2_max_300x297.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-8662444974473612848</id><published>2009-08-10T11:52:00.001-07:00</published><updated>2009-08-10T11:59:19.029-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='plans'/><title type='text'>Back to the grind</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://loxosceles.org/crafty/bacterium.html"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 150px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SoBsqQeXOOI/AAAAAAAAAPU/EFBXf4Zp5pM/s200/four_bacteria.jpg" alt="" id="BLOGGER_PHOTO_ID_5368410229058255074" border="0" /&gt;&lt;/a&gt;Whew!  Getting that albatross (= grant application) off my neck feels good, but now I need to formulate a plan for the next several weeks of lab work.  What are my priorities?  There’s quite a lot of things I could be doing, so I might as well make a list...&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;(A) Generate purified DNA from 86-028NP, PittEE, and PittGG donor DNA:  Should be pretty easy.  I tried this a couple times in the past couple weeks, but had a few problems:&lt;br /&gt;&lt;ol&gt;&lt;li&gt; Got a goopy mess.  I don’t know if it was my lysis or something about my extractions, but when I would try to pull the aqueous layers off of my phenol/chloroform extractions, this goopy junk at the interface kept interfering with my ability to complete the preps.  I will try try again, but this next time I will do a few things differently: (i) start with fewer cells per volume, (ii) do a straight phenol extraction first, (iii) let lysis proceed longer.&lt;/li&gt;&lt;li&gt; PittEE didn’t grow.  I’d gone to the lab stocks, streaked for single colonies, grown up overnight cultures from single colonies, and stored them in glycerol in the -80.  But when I returned to PittEE frozen stocks, they don’t grow up!  Weird.  Maybe PittEE is particularly bad at surviving outside log-phase, so I had merely frozen down a pile of dead cells.  I’ll have to return to the lab stocks and restreak for single colonies.&lt;/li&gt;&lt;li&gt; I never checked the putative nalidixic acid resistant phenotype of 86-028NP.  I really hope that marker will work to select for transformants from 86-028NP to KW20, as this would really help several steps of the project.  In our strain database, 86-028NP is listed as NalR, but this might only be a clinically relevant phenotype.  I need to check it on plates.&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;(B) Check the uptake and transformation phenotypes of WT, &lt;span style="font-style: italic;"&gt;rec-2&lt;/span&gt;, &lt;span style="font-style: italic;"&gt;rec-1&lt;/span&gt;, and &lt;span style="font-style: italic;"&gt;pilB&lt;/span&gt;:  I’ve already gone through this exercise with wt and &lt;span style="font-style: italic;"&gt;rec-2&lt;/span&gt;, but as  a negative control, &lt;span style="font-style: italic;"&gt;pilB&lt;/span&gt; mutants should neither take up DNA nor be transformed by it, and for my translocation experiments, &lt;span style="font-style: italic;"&gt;rec-1&lt;/span&gt; should take up DNA normally but fail to transform.  I have all these strains frozen down as M-IV competent cells, so I just need to do the experiments to verify that my mutants have the expected phenotypes.  The &lt;span style="font-style: italic;"&gt;rec-1&lt;/span&gt; mutant in particular is important, since I plan to use it for cytosolic DNA purifications, but it could be somewhat leaky, something I need to know.  (The only&lt;span style="font-style: italic;"&gt; rec-1&lt;/span&gt; mutant available is the original isolated mutant from back in 1972.  It’s older than me!&lt;br /&gt;&lt;br /&gt;(C) Quantify uptake and transformation for different-sized donor DNA:  A coupe weeks ago, I used a “Bioruptor” to sonicate MAP7 DNA to different size distributions, and I tested the transformation rate of antibiotic resistance into competent cells of some of these different sized DNA.  I also want to check how well they’re taken up.  Once I’ve done this, I’ll report back the full set of results.  The expectation is that while even small fragments will be taken up well, only large fragments will transform well.  There’s also an interesting expected relationship between the amount of uptake expected and the size distribution of the DNA.  If the DNA fragments are quite small, then only a fraction are expected to contain uptake signals, but many fragments containing such signals will be taken up.  On the other side, large DNA fragments will mostly all contain uptake signals, but only a few molecules will be taken up, but these will be larger... Hmmm....&lt;br /&gt;&lt;br /&gt;(D) Design the degenerate USS construct.  This will require its own post, but suffice it to say, we have a pretty good plan on how to design this construct so that we will be able to directly sequence the degenerate USS motif by short single-end Illumina sequencing without any processing steps upfront.&lt;br /&gt;&lt;br /&gt;(E) Transformation experiment:  If indeed 86-028NP can grow on nalidixic acid plates, then I will just go ahead and do half of the experimental-side of one of my specific aims.  All I need to do is transform KW20 competent cells (which I have) with DNA taken from 86-028NP (which I’ll make), select for colonies that are NalR, pick several of them, grow them up, and extract their DNA.  This would provide up with plenty of material to produce sequencing libraries and then send to sequencing.  Then we can really get this project off the ground.&lt;br /&gt;&lt;br /&gt;(F) Find somewhere that will do sequencing with a short turn-around:  The local Illumina sequencers (at UBC’s Genome Center) offer what looks like excellent sequencing services for a reasonable cost, but their turn-around time is a bit too slow for us to hope to have any preliminary data for Rosie’s grant applications.  If we want to get anything done more quickly, we will have to find someone else to get started.&lt;br /&gt;&lt;br /&gt;(G) Design and produce a large DNA fragment from the 86-028NP genome to use for developing a cytosolic ssDNA prep: Since the most difficult experimental part of our plans is likely to be purifying (and cloning) ssDNA from the cytosol, I’d like to start with a defined construct I can use for purification experiments.  I think that a good choice would be a large clone from 86-028NP, because this might be useful for other experiments as well.  I will want something at least several kb on a plasmid.  I will need to think about whether or not it matters what this fragment contains.  As a first guess, I would want it to contain the putative NalR allele and several kb of flank.&lt;br /&gt;&lt;br /&gt;What else should I be doing?&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-8662444974473612848?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/8662444974473612848/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/08/back-to-grind.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/8662444974473612848'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/8662444974473612848'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/08/back-to-grind.html' title='Back to the grind'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_7qRGxl6StM4/SoBsqQeXOOI/AAAAAAAAAPU/EFBXf4Zp5pM/s72-c/four_bacteria.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-8783844312249355357</id><published>2009-08-06T08:45:00.000-07:00</published><updated>2009-08-06T09:41:51.103-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='plans'/><title type='text'>Almost...</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://images.google.ca/imgres?imgurl=http://www.anu.edu.au/BoZo/Double/images/Shy%2520albatross.jpg&amp;amp;imgrefurl=http://www.anu.edu.au/BoZo/Double/&amp;amp;usg=__k5X8TmWI2i7n-yvx_arqff14t00=&amp;amp;h=580&amp;amp;w=439&amp;amp;sz=180&amp;amp;hl=en&amp;amp;start=3&amp;amp;um=1&amp;amp;tbnid=llFKKNXCxI_MNM:&amp;amp;tbnh=134&amp;amp;tbnw=101&amp;amp;prev=/images%3Fq%3Dalbatross%26hl%3Den%26client%3Dfirefox-a%26rls%3Dorg.mozilla:en-US:official%26sa%3DN%26um%3D1"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 303px; height: 400px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/Snr7VSbLeUI/AAAAAAAAAPE/8xd6sZjSy_A/s400/Shy+albatross.jpg" alt="" id="BLOGGER_PHOTO_ID_5366878249107028290" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;...there...  However it turns out, I'll finally have this grant I've been working on out of my hands tomorrow.  I'm actually thrilled to get back into the lab when it's all over...&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-8783844312249355357?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/8783844312249355357/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/08/almost.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/8783844312249355357'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/8783844312249355357'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/08/almost.html' title='Almost...'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_7qRGxl6StM4/Snr7VSbLeUI/AAAAAAAAAPE/8xd6sZjSy_A/s72-c/Shy+albatross.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-6762766025528863441</id><published>2009-07-27T11:53:00.001-07:00</published><updated>2009-07-27T12:01:15.988-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='speculation'/><category scheme='http://www.blogger.com/atom/ns#' term='evolution of sex'/><title type='text'>Food-to-sex ratio?</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/Sm33vQBcLFI/AAAAAAAAAO8/dgyM2Vf0Lus/s1600-h/foodsex.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 181px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/Sm33vQBcLFI/AAAAAAAAAO8/dgyM2Vf0Lus/s400/foodsex.png" alt="" id="BLOGGER_PHOTO_ID_5363215122395049042" border="0" /&gt;&lt;/a&gt;In my &lt;a href="http://nodnacontrol.blogspot.com/2009/07/building-periplasm-prep.html"&gt;recent&lt;/a&gt; &lt;a href="http://nodnacontrol.blogspot.com/2009/07/standing-upon-shoulders-of-giants.html"&gt;experiments&lt;/a&gt; using radiolabeled USS-1 donor DNAs, I was impressed by just how well naturally competent &lt;span style="font-style: italic;"&gt;H. influenzae&lt;/span&gt; will slurp up USS-containing DNA.  I’d read about it, but observing it myself was really something.  (It reminds me a bit of the first time I “saw” Mendelian segregation when I dissected my first yeast tetrads.)&lt;br /&gt;&lt;br /&gt;The other thing that struck me, reading the antecedents of my uptake experiments, was that a large majority of taken up DNA is simply degraded and the nucleotide subunits used for DNA replication.  In my experiments, while donor DNA remained intact in &lt;span style="font-style: italic;"&gt;rec-2&lt;/span&gt; cells, there was no intact donor DNA after an hour in wild-type cells.  All the radiolabel was found in the chromosomal DNA.&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;Using linearized USS-containing plasmid donor DNA, &lt;a href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&amp;amp;pubmedid=3874865"&gt;Barouki and Smith (1985)&lt;/a&gt; nicely show that this chromosomal labeling is NOT dependent on recombination, as &lt;span style="font-style: italic;"&gt;rec-1&lt;/span&gt; mutant competent cells obtain similar levels of chromosomal labeling (&lt;span style="font-style: italic;"&gt;rec-1&lt;/span&gt; is the recA homolog in &lt;span style="font-style: italic;"&gt;H. influenzae&lt;/span&gt;).  Using restriction digestion, they also nicely show that in wild-type cells, only some of the donor DNA manages to recombine into the recipient chromosome.&lt;br /&gt;&lt;br /&gt;The most important lanes in the above autoradiogram from Figure 3 of their paper are Lane E and Lane F, which show restriction-digested DNA after uptake in wild-type and &lt;span style="font-style: italic;"&gt;rec-1&lt;/span&gt;.  In Lane E (wild-type), the two arrowheads indicate the chromosomal restriction fragments indicative of transformation by the linear plasmid donor.  In Lane F (rec-1), there is no appreciable transformation of the donor DNA into the chromosome (those restriction fragments are gone).&lt;br /&gt;&lt;br /&gt;These results raise an interesting prospect:  Could we interpret amount of  recombination-independent radiolabeling relative to the recombination-dependent radiolabeling as a “food-to-sex” ratio?  Undeniably, DNA is taken up and used by competent cells, but it’s clearly used in two different ways:  Subunit recycling (food) and recombination (sex).  By my eye, it seems like a lot more of the donor DNA, even in wild-type cells, is used for food than for sex.&lt;br /&gt;&lt;br /&gt;Of course the amount of taken up DNA a cell could use for “sex” would be highly dependent on what DNA was taken up.  In the case of the experiment above if homologous portions of the donor plasmid are removed, then the “sex” fragments disappear (Lanes I and J for wt and &lt;span style="font-style: italic;"&gt;rec-1&lt;/span&gt;, respectively).  Furthermore, the length of homologous DNA fragments taken up by competent cells is likely to matter, due to degradation during translocation and in the cytosol.&lt;br /&gt;&lt;br /&gt;In &lt;a href="http://www3.interscience.wiley.com/journal/122212032/abstract?CRETRY=1&amp;amp;SRETRY=0"&gt;Maughan and Redfield (2009)&lt;/a&gt;, they show extensive natural variation among H. influenzae strains in the amount of uptake and transformation that competent cells will undertake.  Do strains that can take up DNA well but fail to transform have a high food to sex ratio?&lt;br /&gt;&lt;br /&gt;The assay that Barouki and Smith use doesn’t seem like the best way to measure a food-to-sex ratio, since the “sex” signal is somewhat buried behind the “food” signal.  I wonder if there’s an experimental scheme that would allow one to measure such a ratio more accurately.  Is there a way I could feed one strain’s chromosomes to another recipient strain and figure out how much DNA incorporated into chromosomes is recombination-dependent and how much is recombination-independent?&lt;br /&gt;&lt;br /&gt;Anyway, thinking about this has led me to having a slightly clearer idea about the food versus sex hypotheses for the maintenance of natural competence by natural selection.  Things are rarely black-and-white, so perhaps both models have their merits, but it seems like it might be possible to experimentally measure how much naturally competent cells use DNA for food or sex.  Would this help in understanding these arguments?&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-6762766025528863441?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/6762766025528863441/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/07/food-to-sex-ratio.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/6762766025528863441'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/6762766025528863441'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/07/food-to-sex-ratio.html' title='Food-to-sex ratio?'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_7qRGxl6StM4/Sm33vQBcLFI/AAAAAAAAAO8/dgyM2Vf0Lus/s72-c/foodsex.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-8230136173262358731</id><published>2009-07-24T16:31:00.000-07:00</published><updated>2009-07-24T16:38:18.089-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sequencing'/><category scheme='http://www.blogger.com/atom/ns#' term='plans'/><title type='text'>Pile-ups</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/SmpFzmIJASI/AAAAAAAAAO0/1IalgZ-fhEY/s1600-h/interference.jpg"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 150px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SmpFzmIJASI/AAAAAAAAAO0/1IalgZ-fhEY/s200/interference.jpg" alt="" id="BLOGGER_PHOTO_ID_5362175059048333602" border="0" /&gt;&lt;/a&gt;As I try and address the critiques of my original NIH postdoc fellowship application in my resubmission application, I started fleshing out ways I which our planned periplasmic donor DNA sequencing experiments will investigate the mechanism of DNA uptake.&lt;br /&gt;&lt;br /&gt;I’m playing around with different figures that illustrate what I’ll do and what it might reveal.  Here’s the basic idea:&lt;br /&gt;&lt;br /&gt;(1) Incubate sheared chromosomal donor DNA with recipient competent cell preparations.&lt;br /&gt;(2) Recover the DNA that is taken up in to the periplasm.&lt;br /&gt;(3) Obtain paired-end sequence data for periplasmic and input DNA libraries.&lt;br /&gt;(4) Compare the abundance of different sequences between the periplasmic and input DNA libraries to calculate the periplasmic uptake efficiency for sequences across the genome.&lt;br /&gt;&lt;br /&gt;What would this data look like?&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;One mapped back to the genome, each paired-end read will define the span of an individual fragment from the sequenced pool.&lt;br /&gt;&lt;br /&gt;Below, I illustrate what a few dozen of these spans would look like mapped to a short stretch of chromosome containing an uptake signal sequence in the center.&lt;br /&gt;&lt;br /&gt;The first diagram shows what the input DNA pool would look like.  The blue spans do not contain USS, while the red spans do.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/SmpE_ZIF7-I/AAAAAAAAAOE/MJU6Kcx-314/s1600-h/Slide3.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 108px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SmpE_ZIF7-I/AAAAAAAAAOE/MJU6Kcx-314/s400/Slide3.png" alt="" id="BLOGGER_PHOTO_ID_5362174162205274082" border="0" /&gt;&lt;/a&gt;The second diagram shows what a similar amount of sequencing of the periplasmic uptake DNA pool might look like.  I assume that the presence of a USS motif is sufficient and necessary to strongly stimulate uptake (“all-or-nothing” model of uptake).  Thus, spans containing USS would be much more abundant than spans that do not.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/SmpFC3XKuGI/AAAAAAAAAOM/Rc5eFGjNJck/s1600-h/Slide4.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 256px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/SmpFC3XKuGI/AAAAAAAAAOM/Rc5eFGjNJck/s400/Slide4.png" alt="" id="BLOGGER_PHOTO_ID_5362174221861173346" border="0" /&gt;&lt;/a&gt;These sequence data could be plotted in several ways:&lt;br /&gt;&lt;br /&gt;(1) Spanning coverage at each genomic position:  The input DNA is expected to have roughly equal spanning coverage of eaach genomic position, but spanning coverage in the periplasmic uptake library is expected to be higher closer to USS motifs.  As the distance between a genomic position and USS increases, fewer spans will contain both.  Peaks will indicate USS, and peak height will indicate the effectiveness of individual USS loci.  I estimate that one lane of sequencing the input will provide ~2500X spanning coverage per nucleotide for 500 bp donor DNA fragments.&lt;br /&gt;&lt;br /&gt;(2) End coverage at each genomic position:  If the position of USS in a fragment is irrelevant to the uptake mechanism, then plotting end coverage would have a different shape than the spanning coverage around.  Since any fragment containing USS will be effectively taken up, I expect a more sawtooth-shaped distribution of end reads at USS determined by the spanning fragment length.  I estimate that one lane of sequencing the input will provide 100X end coverage per nucleotide for 500 bp donor DNA fragments.&lt;br /&gt;&lt;br /&gt;Below, I illustrate an idealized case of spanning and end coverage.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/SmpFMIZoY2I/AAAAAAAAAOc/8BnPuIJSdaM/s1600-h/Slide5A.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 275px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/SmpFMIZoY2I/AAAAAAAAAOc/8BnPuIJSdaM/s320/Slide5A.png" alt="" id="BLOGGER_PHOTO_ID_5362174381053731682" border="0" /&gt;&lt;/a&gt;Different USS loci may behave in different fashions.  Here are what two other scenarios might look like:&lt;br /&gt;&lt;br /&gt;(1) If uptake is polarized by USS, such that the position of USS on a fragment is important, or if there was an uptake blocking sequence nearby a USS, the distribution might be skewed:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/SmpFPlg4gGI/AAAAAAAAAOk/J9QRdrE7kHA/s1600-h/Slide5B.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 300px; height: 320px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SmpFPlg4gGI/AAAAAAAAAOk/J9QRdrE7kHA/s320/Slide5B.png" alt="" id="BLOGGER_PHOTO_ID_5362174440408383586" border="0" /&gt;&lt;/a&gt;(2) If a fragment’s uptake is equally efficient with one or two USS motifs (a USS interference model), then coverage around two nearby USS might look like this:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/SmpFSgHQTDI/AAAAAAAAAOs/GK-3C9hbH3s/s1600-h/Slide5C.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 308px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SmpFSgHQTDI/AAAAAAAAAOs/GK-3C9hbH3s/s320/Slide5C.png" alt="" id="BLOGGER_PHOTO_ID_5362174490498321458" border="0" /&gt;&lt;/a&gt;I wonder what other mechanistic details might be found in the data...&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-8230136173262358731?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/8230136173262358731/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/07/pile-ups.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/8230136173262358731'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/8230136173262358731'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/07/pile-ups.html' title='Pile-ups'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_7qRGxl6StM4/SmpFzmIJASI/AAAAAAAAAO0/1IalgZ-fhEY/s72-c/interference.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-4095020365571288362</id><published>2009-07-17T14:54:00.000-07:00</published><updated>2009-07-17T15:29:17.325-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='lab'/><title type='text'>Dose Response</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://neil.fraser.name/news/2009/03/23/"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 150px; height: 200px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SmDzMcjf97I/AAAAAAAAANU/AY4a26MegbA/s200/rice_dna1.jpg" alt="" id="BLOGGER_PHOTO_ID_5359550951719958450" border="0" /&gt;&lt;/a&gt;How hungry are competent cells for DNA?  I know that about a billion cells will consume ~65% of 20 nanograms tasty USS-1 fragment, but what if I offer the cells different amounts of USS-1?&lt;br /&gt;&lt;br /&gt;To get a better hands-on feel for the DNA uptake process in wild-type and &lt;span style="font-style: italic;"&gt;rec-2&lt;/span&gt; mutant competent cells, I did a dose response experiment, where I incubated competent cells with different amounts of USS-1 DNA.&lt;br /&gt;&lt;br /&gt;For this first experiment, I used 0.5 ml of competent cell cultures for each sample and did 6 different amounts of USS-1 DNA (12 samples total for wt and rec-2).  I didn’t have enough radiolabeled fragment for all of my desired concentration, so I mixed in some cold USS-1 DNA to make up the difference.  I let the DNA and cells incubate for 30 mins, then I washed the cells several times and determined the total radioactive counts in the cell pellet and washes to determine the % uptake and total uptake.&lt;br /&gt;&lt;br /&gt;Here’s the results:&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;Total Uptake:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/SmD6l4keYQI/AAAAAAAAANs/siGqpgGBJ5c/s1600-h/totalDOSE.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 227px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SmD6l4keYQI/AAAAAAAAANs/siGqpgGBJ5c/s320/totalDOSE.png" alt="" id="BLOGGER_PHOTO_ID_5359559085318365442" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Percent uptake:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/SmD03jwy44I/AAAAAAAAANk/YcFxuLyryGQ/s1600-h/percentDose.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 227px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SmD03jwy44I/AAAAAAAAANk/YcFxuLyryGQ/s320/percentDose.png" alt="" id="BLOGGER_PHOTO_ID_5359552791900775298" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Interestingly, &lt;span style="font-style: italic;"&gt;rec-2&lt;/span&gt; does better at low concentrations of DNA than wild-type, but worse with high concentrations.  The latter could be due to the periplasm getting too clogged with DNA, such that the outer membrane uptake machinery has to work too hard to get more DNA through, while in wild-type translocation of DNA frees up space in the periplasm.  But the former (higher uptake in &lt;span style="font-style: italic;"&gt;rec-2&lt;/span&gt; at low DNA concentrations) doesn't really make much sense to me.  Maybe not all free nucleotides created during degradation at the inner membrane remain in the cell, so that at low concentrations, &lt;span style="font-style: italic;"&gt;rec-2&lt;/span&gt;  simply holds more label?&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-4095020365571288362?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/4095020365571288362/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/07/dna-as-food.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/4095020365571288362'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/4095020365571288362'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/07/dna-as-food.html' title='Dose Response'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_7qRGxl6StM4/SmDzMcjf97I/AAAAAAAAANU/AY4a26MegbA/s72-c/rice_dna1.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-5221648857243016734</id><published>2009-07-12T12:33:00.001-07:00</published><updated>2009-07-12T12:54:24.653-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='periplasm'/><category scheme='http://www.blogger.com/atom/ns#' term='lab'/><title type='text'>Standing upon the shoulders of giants</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.flickr.com/photos/mushon/282287572/"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 200px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/Slo6qM7jVQI/AAAAAAAAAM8/BwVcW4jsm0I/s200/282287572_6b64a90b50.jpg" alt="" id="BLOGGER_PHOTO_ID_5357659203410547970" border="0" /&gt;&lt;/a&gt;It really is gratifying to have things work the way they're supposed to.  Some kind of bug bit me on Saturday and I came in to see if the periplasmic DNA preparation reported by &lt;a href="http://www.pnas.org/content/80/22/6927.abstract"&gt;Kahn &lt;span style="font-style: italic;"&gt;et al&lt;/span&gt; 1983&lt;/a&gt; would work in my hands.  And sure enough it did!&lt;br /&gt;&lt;br /&gt;The experiment was much the same as before.  I added radiolabeled USS-1 fragments to either wild-type or &lt;span style="font-style: italic;"&gt;rec-2&lt;/span&gt; competent cell preps, incubated for 5 minutes, and then either prepared total DNA or did the periplasmic extraction (TE/1.5M CsCl + phenol/acetone, 1:1).&lt;br /&gt;&lt;br /&gt;Since wild-type cells will take up the fragment, but also incorporate labeled subunits from degradation of taken up DNA, I can tell if the periplasmic DNA prep managed to exclude chromosomal DNA.  But first, I counted the radiolabel present in the different cellular fractions...&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;This time, about a quarter of the USS-1 fragment added was taken up within five minutes (wild-type: 26%; &lt;span style="font-style: italic;"&gt;rec-2&lt;/span&gt;: 28%).  I suspect these numbers are lower than the last time I did it, because my five minutes was really five minutes (whereas the first time, I think I was 2-3 minutes late).&lt;br /&gt;&lt;br /&gt;The extraction:  When I collected the aqueous phase, I also collected the organic phase, and the interface between the phases (which should contain the cells minus their outer membranes).  I counted the radiolabel in these different fractions as before:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/Slo9BE5ed-I/AAAAAAAAANE/HPPaJeY1-kA/s1600-h/extractCounts.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 210px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/Slo9BE5ed-I/AAAAAAAAANE/HPPaJeY1-kA/s320/extractCounts.png" alt="" id="BLOGGER_PHOTO_ID_5357661795414603746" border="0" /&gt;&lt;/a&gt;Wild-type cells had label in both the aqueous extract, as well as in the interface containing the cells, while &lt;span style="font-style: italic;"&gt;rec-2&lt;/span&gt; had nearly all the label in the aqueous extract.  The organic phase had less than 1% &lt;span style="font-style: italic;"&gt;rec-2&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;But here's the important bit:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/Slo9r38GspI/AAAAAAAAANM/ZkYolV0JjnM/s1600-h/extract1.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 242px; height: 174px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/Slo9r38GspI/AAAAAAAAANM/ZkYolV0JjnM/s320/extract1.png" alt="" id="BLOGGER_PHOTO_ID_5357662530670342802" border="0" /&gt;&lt;/a&gt;Lane 1: Input (1/3, or 4 ng)&lt;br /&gt;Lane 2: Total DNA, wild-type + USS-1 for 5 min.&lt;br /&gt;Lane 3: Total DNA, &lt;span style="font-style: italic;"&gt;rec-2&lt;/span&gt; +USS-1 for 5 min.&lt;br /&gt;Lane 4: Peri DNA, wild-type + USS-1 for 5 min.&lt;br /&gt;Lane 5: Peri DNA, &lt;span style="font-style: italic;"&gt;rec-2&lt;/span&gt; +USS-1 for 5 min.&lt;br /&gt;&lt;br /&gt;The important point here is that in the total DNA extract of wild-type, both intact donor USS-1 and chromosomal labeling are evident, while in the peri-extract of wild-type, there is no chromosomal label.&lt;br /&gt;&lt;br /&gt;This means that the extraction I did successfully purified periplasmic DNA over chromosomal DNA.  Fabulous!&lt;br /&gt;&lt;br /&gt;Now I need to scale this protocol up, and get cleaner DNA (i.e. use RNase), so hopfully I can see this without using radiolabel.  If I can really get clean periplasmic DNA with little or no chromosomal contamination, I will move onto doing the "real" experiment with donor DNA made up from sheared genomic DNA of another isolate.&lt;br /&gt;&lt;br /&gt;Yay!&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-5221648857243016734?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/5221648857243016734/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/07/standing-upon-shoulders-of-giants.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/5221648857243016734'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/5221648857243016734'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/07/standing-upon-shoulders-of-giants.html' title='Standing upon the shoulders of giants'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_7qRGxl6StM4/Slo6qM7jVQI/AAAAAAAAAM8/BwVcW4jsm0I/s72-c/282287572_6b64a90b50.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-4909222761080407619</id><published>2009-07-10T14:52:00.000-07:00</published><updated>2009-07-10T17:56:57.562-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='lab'/><title type='text'>Building  a periplasm prep...</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.exploratorium.edu/ti/human_body/dna.html"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 133px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SlfgN-WLFoI/AAAAAAAAAMA/yvCMLA1YJrI/s200/dna.jpg" alt="" id="BLOGGER_PHOTO_ID_5356996812459415170" border="0" /&gt;&lt;/a&gt;After my failed attempts at doing a large-scale periplasm prep right off the bat, I decided to spend this week going a bit more slowly.  I repeated what others have already done successfully using radio-labeled DNA fragments as donors.  This means that I can do smaller scale experiments and don't need particularly pure DNA.&lt;br /&gt;&lt;br /&gt;And this time the experiments all worked.  Here's what I did:&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;(1) I made competent cells of KW20 (RR722) and KW20 rec-2 (RR622).  I confirmed that the wild-type strain transformed normally and the &lt;span style="font-style: italic;"&gt;rec-2 &lt;/span&gt;strain not at all (or at least below my limit of detection).  This confirmed that my competent cell preps were okay, and that the &lt;span style="font-style: italic;"&gt;rec-2&lt;/span&gt; strain seems to be correct.&lt;br /&gt;&lt;br /&gt;(2) Following the DNA uptake assay protocol of &lt;a href="http://www.bioone.org/doi/abs/10.1111/j.1558-5646.2009.00658.x"&gt;Maughan and Redfield, 2009&lt;/a&gt;, I showed that USS-1 is taken up very well, but USS-R is only poorly taken up.  To do this, I simply incubated ~12 ng of either radio-labeled USS-1 or USS-R with 0.5 ml of competent cells for 20 minutes, and then compared the radioactive counts in a washed cell pellet compared to the total counts:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/SlfAbDs1K4I/AAAAAAAAALg/G8epNOjD5VU/s1600-h/USS-1vs-R.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 215px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SlfAbDs1K4I/AAAAAAAAALg/G8epNOjD5VU/s320/USS-1vs-R.png" alt="" id="BLOGGER_PHOTO_ID_5356961852862835586" border="0" /&gt;&lt;/a&gt;Wild type and &lt;span style="font-style: italic;"&gt;rec-2&lt;/span&gt; both take up USS-1 well, but USS-R poorly, as expected.  But &lt;span style="font-style: italic;"&gt;rec-2&lt;/span&gt; seems to take up USS-1 slightly better than wild type.  This is also true in the next experiment.  This may be significant but could also reflect slight differences in the competent cell prep of the two strains.&lt;br /&gt;&lt;br /&gt;Possibly the coolest part of this for me was that I got numbers that were spot-on the former post-doc's numbers (found in her notebook) and older papers describing % uptake.  That is:  ~65% uptake for ~20 ng / ml of cells.  This was very encouraging to me.&lt;br /&gt;&lt;br /&gt;(3) I repeated the uptake assay described above using ~12 ng USS-1 donor DNA and incubated wild-type and &lt;span style="font-style: italic;"&gt;rec-2&lt;/span&gt; cells for either 5 or 60 minutes.  This gave results similar to those shown above:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/SlfAyyuWV4I/AAAAAAAAALw/adYytsAu0Es/s1600-h/tc1.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 216px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SlfAyyuWV4I/AAAAAAAAALw/adYytsAu0Es/s320/tc1.png" alt="" id="BLOGGER_PHOTO_ID_5356962260622661506" border="0" /&gt;&lt;/a&gt;Most uptake was finished after only 5 mins, though additional incubation increased the level of uptake.  The results were nearly identical for 60 min incubation as for 20 min incubation, so I don't need to do it for so long..  The rec-2 strain again showed slightly more uptake at all time points.&lt;br /&gt;&lt;br /&gt;After this, I took it a step further:  I also extracted the DNA from the cell pellets and ran them out on a gel.  I also included the input donor DNA as a control.  I dried down the gel and exposed it to a phosphor screen.  This is what the gel looked like:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/SlfAp-j2mTI/AAAAAAAAALo/h1xMD8hWE5E/s1600-h/wtVSrec-2.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 231px; height: 186px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SlfAp-j2mTI/AAAAAAAAALo/h1xMD8hWE5E/s320/wtVSrec-2.png" alt="" id="BLOGGER_PHOTO_ID_5356962109181040946" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Lane 1: Donor DNA (50% of input; ~6 ng)&lt;br /&gt;Lane 2: Wild-type + USS-1 for 5 min.  Total DNA.&lt;br /&gt;Lane 3: Wild-type + USS-1 for 60 min. Total DNA.&lt;br /&gt;Lane 4: rec-2 + USS-1 for 5 min. Total DNA.&lt;br /&gt;Lane 5: rec-2 + USS-1 for 60 min. Total DNA.&lt;br /&gt;&lt;br /&gt;Alright!  That's exactly what I hoped for!  (Well, not quantitatively between lanes: this was a sloppy first experiment.)  The gel shows that the natural competence phenotypes of the two strains: wild-type and &lt;span style="font-style: italic;"&gt;rec-2&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;Intact uptake DNA is the smaller (lower) band, while chromosomal DNA is the high molecular weight species.  In wild type, donor DNA gets degraded and nucleotides can be incorporated into the chromosome over time.  (Importantly, the labeling of the chromosome is NOT from transformation, but from incorporation of degraded nucleotides into the genome by DNA replication.)  In &lt;span style="font-style: italic;"&gt;rec-2&lt;/span&gt;, the radio-labeled donor DNA is trapped in the periplasm and isn't degraded.  So there is no chromosomal labeling in this case.&lt;br /&gt;&lt;br /&gt;This is effectively a repeat of an experiment from &lt;a href="http://jb.asm.org/cgi/content/abstract/163/2/629"&gt;Barouki and Smith, 1985&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Next, I'll try exactly the same thing, but I'll also try the extraction from &lt;a href="http://www.pnas.org/content/80/22/6927.abstract"&gt;Kahn &lt;span style="font-style: italic;"&gt;et al.&lt;/span&gt;, 1983.&lt;/a&gt;  If this successfully yields pure periplasmic DNA, then I  expect that the extraction will not yield radiolabeled chromosome, even for the wild type sample.  If that works, I can work on scaling the protocol up to do a real purification of uptake DNA.&lt;br /&gt;&lt;br /&gt;Onward!&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-4909222761080407619?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/4909222761080407619/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/07/building-periplasm-prep.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/4909222761080407619'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/4909222761080407619'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/07/building-periplasm-prep.html' title='Building  a periplasm prep...'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_7qRGxl6StM4/SlfgN-WLFoI/AAAAAAAAAMA/yvCMLA1YJrI/s72-c/dna.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-5340681259648570048</id><published>2009-07-07T12:43:00.000-07:00</published><updated>2009-07-07T13:32:53.233-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='plans'/><category scheme='http://www.blogger.com/atom/ns#' term='cytosol'/><category scheme='http://www.blogger.com/atom/ns#' term='lab'/><title type='text'>Imagine it exists, and maybe it does!</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/SlOn1NS6NNI/AAAAAAAAALQ/Ivoe0jTXjz4/s1600-h/Slide1.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 214px; height: 278px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SlOn1NS6NNI/AAAAAAAAALQ/Ivoe0jTXjz4/s400/Slide1.png" alt="" id="BLOGGER_PHOTO_ID_5355808914417530066" border="0" /&gt;&lt;/a&gt;Our proposed experiments involve capturing DNA molecules at the different stages of natural transformation.  One of the technical challenges we face will be producing a library of DNA molecules that have been translocated into the cytosol.  We have some schemes for how we’ll do the purification of donor DNA from the cytosol, but even assuming that this works wonderfully, we still need to turn these into double-stranded DNA.  We can’t use a specific primer to the 3’ends of translocated ssDNAs, because (a) we don’t know the exact 3’ ends and (b) it will be a complex mixture.&lt;br /&gt;&lt;br /&gt;What to do?&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;Until now the only thing that had occurred to me is to use random priming of our ssDNA to convert cytosolic ssDNA into dsDNA (shown schematically above), but this approach has several limitations.  The biggest problem is that we would only be able to accurately identify the 5’-end of translocated DNA.  The 3’-end of the final dsDNA we produce would not represent the 3’-ends of the original ssDNA molecules using random primers.  Furthermore, we would not know which end of our dsDNA was the original ssDNA’s 5’ or 3’ end.  And finally, we would end up with a highly heterogeneous size distribution, which might complicate sequencing.&lt;br /&gt;&lt;br /&gt;How can I circumvent this, get both ends, and know which is which? I need a strategy like &lt;a href="http://en.wikipedia.org/wiki/Rapid_Amplification_of_cDNA_Ends"&gt;RACE&lt;/a&gt;.  I decided to imagine that a certain enzyme existed that might help me in this endeavor and then see if it actually existed and was already was commercially available.  This strategy has worked for me in the past:  Once, I’d wanted to know if there were restriction enzymes that only nicked at their recognition sites, so I typed “nickase neb” into Google, and &lt;a href="http://www.neb.com/nebecomm/EnzymeFinderSearchByNicking.asp"&gt;sure enough NEB carries nickases&lt;/a&gt;!  Go biotechnologists!&lt;br /&gt;&lt;br /&gt;This time, I want to tack some type of single-stranded adaptor sequence onto the 3’ ends of my putative cytosolic ssDNAs, so I typed “ssDNA ligase” into Google, and &lt;a href="http://www.epibio.com/item.asp?ID=445"&gt;Presto&lt;/a&gt;!... Epicentre produces a single-stranded ligase that they call CircLigase.  Sweet!&lt;br /&gt;&lt;br /&gt;This doesn’t fully solve the problem, since the ligase will normally take an ssDNA and circularize it (since the intra-molecular ligation will usually be favored).  This is useful to plenty of folks who are interested in doing rolling circle amplification and rolling circle transcription, but I would rather not circularize my ssDNAs, but would like to favor ligation of an ssDNA adaptor specifically to the 3’-end.  This will require a couple of bells and whistles.&lt;br /&gt;&lt;br /&gt;If we take our ssDNA and then treat it with a phosphatase, we can rid the 5’-end of its terminal phosphate and both block circular ligation, as well as ligation of our adaptor to the 5’end.  If our adaptor oligonucleotide also has a protected 3’end (not sure how to do this... an oligo with a terminal dideoxy nucleotide?), then we’d block the ligation of the adaptors to each other and force ligation only in the orientation we want (5’ of the adaptor to 3’ of the target).&lt;br /&gt;&lt;br /&gt;Then, using a primer complementary to the adaptor, we can convert full-length ssDNA into dsDNA.  Furthermore, the adaptor marks the original 3’ end of the fragment, so we can give a polarity to our cytosolic fragments.  Here’s the scheme:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/SlOntRuEdpI/AAAAAAAAALI/wcanLkmQ6TY/s1600-h/Slide2.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 245px; height: 271px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SlOntRuEdpI/AAAAAAAAALI/wcanLkmQ6TY/s400/Slide2.png" alt="" id="BLOGGER_PHOTO_ID_5355808778166236818" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Afterwards, of course, we’d need to either amplify this product or de-protect both ends, so that we could ligate sequencing adaptors to the mixture.  &lt;br /&gt;&lt;br /&gt;This plan just might work, and I could make sure it works using defined substrates, rather than precious (as well as non-existent) cytosolic DNA fractions.  The main thing I can’t think of off the top of my head is getting a hold of an oligo with a protected 3’-end (preferably reversibly so).&lt;br /&gt;&lt;br /&gt;UPDATE: Looks like at least some oligo companies can include dideoxy bases in oligos.  Awesome.  This is not quite as ideal as a reversible protection of the 3' end...&lt;br /&gt;&lt;br /&gt;UPDATE 2:  Uh oh.  How to amplify the product?  There's no primer sequence at one end...  This could involve a phosphorylation step and a ligation of a normal adaptor to that end?  That's an extra unfortunate step.  Also, the adaptor sequences will eat into the sequence read length, but that should be acceptable, since we only need to get tag sequences.&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-5340681259648570048?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/5340681259648570048/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/07/imagine-it-exists-and-maybe-it-does.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/5340681259648570048'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/5340681259648570048'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/07/imagine-it-exists-and-maybe-it-does.html' title='Imagine it exists, and maybe it does!'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_7qRGxl6StM4/SlOn1NS6NNI/AAAAAAAAALQ/Ivoe0jTXjz4/s72-c/Slide1.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-882372910744573407</id><published>2009-07-02T18:19:00.000-07:00</published><updated>2009-07-02T18:33:08.197-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sequencing'/><category scheme='http://www.blogger.com/atom/ns#' term='plans'/><title type='text'>Mutation versus Transformation:  Structural Variation</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/Sk1dNtVEEEI/AAAAAAAAAKo/dicR6D9c6kQ/s1600-h/hindiideletion.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 111px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/Sk1dNtVEEEI/AAAAAAAAAKo/dicR6D9c6kQ/s400/hindiideletion.jpg" alt="" id="BLOGGER_PHOTO_ID_5354038022101012546" border="0" /&gt;&lt;/a&gt;Here's an idea:  Compare structural changes (indels and rearrangements) in transformed and untransformed cultures.&lt;br /&gt;&lt;br /&gt;One major goal of our planned experiments is to measure the transformation rate across the genome.  This is a fairly ambitious prospect, mainly because of the amount of sequencing it will require.&lt;br /&gt;&lt;br /&gt;If we assume that the average donor allele transforms recipient chromosomes 1% of the time, then we would need to sequence the average locus 100 times to see the donor allele just one time.  But to get a reliable measurement of transformation rate would require considerably more... perhaps 10,000 times.  This would give us an average of 100 donor alleles / 10,000 alleles sequenced.  Even using the Illumina platform would get fairly expensive to measure the transformation rate for every SNP.&lt;br /&gt;&lt;br /&gt;One way we could do a preliminary sequencing experiment would be to ignore single-nucleotide differences between donor and recipient and focus solely on indel and rearrangement differences (a.k.a. &lt;a href="http://nodnacontrol.blogspot.com/2009/06/joint-molecules.html"&gt;structural variation&lt;/a&gt;).  This data would speak to &lt;a href="http://nodnacontrol.blogspot.com/2009/06/supragenomicisticexpialidocious.html"&gt;interesting hypotheses&lt;/a&gt; regarding the role of natural competence in maintaining the core genome and diversifying the accessory genome, and it would require considerably less sequencing.&lt;br /&gt;&lt;br /&gt;So the goal of the preliminary experiment would be to measure structural variation due to mutation versus structural variation due to transformation.&lt;br /&gt;&lt;br /&gt;How would this work, and why would it be cheap and easy?&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;First of all, the experiment-side is extremely easy.  Naturally competent recipient cultures would be split in two.  One part would be used to prepare untransformed recipient chromosomes; the other part would be incubated with donor DNA for a while, allowed to recover, and then transformed chromosomes would be purified.  (We could also include a selection for a donor marker at this point to increase the relative transformation rate.)  That’s it.&lt;br /&gt;&lt;br /&gt;The sequencing side would also be comparatively easy.  Untransformed and transformed chromosome preparations would be sheared, end-repaired, and size-fractionated by gel to 500 bp (as precisely as possible).  This DNA would be ready for sequencing library construction and paired-end sequencing.&lt;br /&gt;&lt;br /&gt;For a 500 bp library, I &lt;a href="http://nodnacontrol.blogspot.com/2009/05/sequencing-ho.html"&gt;previously estimated&lt;/a&gt; ~2500X “spanning coverage” in one lane of Illumina sequencing using conservative estimates of the sequencing parameters.  “Spanning coverage” is defined as how many times a particular genomic position is found between two mapped paired-end reads.  So we’d get to 10,000X spanning coverage in ~1/2 a full run.&lt;br /&gt;&lt;br /&gt;How would this help us measure the transformation of donor structural variants into the recipient?&lt;br /&gt;&lt;br /&gt;Let’s use an example to illustrate.  In the alignment shown above (using GenomeMatcher), the donor genome (86-028NP) is shown on top, while the recipient genome is shown on bottom (Kw20).  As is pretty clear, genes Hi_0512 and Hi_0513 are absent from the donor genome, indicating a deletion of those genes from the donor (or possibly an insertion into the recipient).  These genes happen to be the HindII restriction enzyme and methylase.  The flanking genes are syntenic (so Hi_0511 = tchA, etc.).&lt;br /&gt;&lt;br /&gt;Since we know the size-distribution of the library (500 bp), it would be quite simple to spot the deletion allele.  Paired-end reads with one end in tchA and  the other in rpoC would define the deletion.  By contrast the insertion allele would always have tchA and rpoC mappings on different fragments (with the other ends in the HindII methylase or restriction enzyme).&lt;br /&gt;&lt;br /&gt;Here’s a way of illustrating what paired-end reads of different kinds of alleles relative to the recipient would look like:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/Sk1drbHBp5I/AAAAAAAAAKw/ugrb83C9jdg/s1600-h/spans.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 129px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/Sk1drbHBp5I/AAAAAAAAAKw/ugrb83C9jdg/s400/spans.png" alt="" id="BLOGGER_PHOTO_ID_5354038532606371730" border="0" /&gt;&lt;/a&gt;So for our deletion, we’d see paired-ends that mapped to positions further apart than they should be.  By having extremely high spanning coverage, we could count how often these kinds of mappings occurred versus the recipient mappings.  This would give us the rate of deletion.&lt;br /&gt;&lt;br /&gt;In our untransformed library, if we saw the deletion allele, we’d be seeing mutation, while in the transformed library we’d be seeing mutation and/or transformation.  Since we know the donor genome sequence, we can distinguish paired-end reads that look like the donor sequence versus &lt;span style="font-style: italic;"&gt;de novo&lt;/span&gt; mutations that occurred when we grew out the cells.  Better still, we could spot transformation-induced &lt;span style="font-style: italic;"&gt;de novo&lt;/span&gt; mutations by comparing to the untransformed library.&lt;br /&gt;&lt;br /&gt;Why else do the untransformed chromosomes at all?  Well, structural mutations are likely to occur at a much higher rate than single-nucleotide mutations in many instances.  Independent of our interest in natural transformation, doing the control experiment may reveal regions of the genome that are unstable, along with  the mutation rate of different types of structural variation. This last part is non-trivial.  If, for example, we see a particular deletion that occurred on 50% of the untransformed chromosomes we looked at, it could be that this is an extremely frequent mutation, but it could also have simply occurred early in the grow-out of the culture.&lt;br /&gt;&lt;br /&gt;As a control for transformation rates, we don’t have to worry about that,  but doing the untransformed control would make us confident that changes we saw were induced by transformation and not simply due to such kinds of mutation.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-882372910744573407?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/882372910744573407/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/07/mutation-versus-transformation.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/882372910744573407'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/882372910744573407'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/07/mutation-versus-transformation.html' title='Mutation versus Transformation:  Structural Variation'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_7qRGxl6StM4/Sk1dNtVEEEI/AAAAAAAAAKo/dicR6D9c6kQ/s72-c/hindiideletion.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-3380880196240908225</id><published>2009-07-02T16:07:00.000-07:00</published><updated>2009-07-02T16:15:22.941-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='plans'/><category scheme='http://www.blogger.com/atom/ns#' term='lab'/><title type='text'>Happy (Belated) Canada Day!</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/Sk09uTClrEI/AAAAAAAAAKg/17PXQEgrlzE/s1600-h/s28_5e009194.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 264px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/Sk09uTClrEI/AAAAAAAAAKg/17PXQEgrlzE/s400/s28_5e009194.jpg" alt="" id="BLOGGER_PHOTO_ID_5354003397607795778" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;(Image: The Canadian-built robot arm attaching the space shuttle docked to the Hubble Space Telescope with the Earth in the background.)&lt;br /&gt;&lt;br /&gt;Oh yeah, and my second attempt at preparing periplasm DNA was... &lt;span class="fullpost"&gt;inconclusive.    But it was pretty interesting to try out.  In particular, the TE/CsCl/phenol/acetone extraction was quite compelling visually, involving small bubbles breaking up and reforming.  I need to start these experiments out at a smaller scale.&lt;br /&gt;&lt;br /&gt;And luckily, we've received radiolabeled dATP, so I can do some more sensitive and controlled experiments next week.  The use of radiolabeled uptake fragments will be significantly more sensitive and allow me to use small cultures and follow small amounts of uptake DNA.&lt;br /&gt;&lt;br /&gt;Once I've got a functioning uptake assay, I can work out the best purification method and scale up from there.&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-3380880196240908225?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/3380880196240908225/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/07/happy-belated-canada-day.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/3380880196240908225'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/3380880196240908225'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/07/happy-belated-canada-day.html' title='Happy (Belated) Canada Day!'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_7qRGxl6StM4/Sk09uTClrEI/AAAAAAAAAKg/17PXQEgrlzE/s72-c/s28_5e009194.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-7547134680235548192</id><published>2009-06-30T19:13:00.001-07:00</published><updated>2009-06-30T19:35:50.218-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='plans'/><category scheme='http://www.blogger.com/atom/ns#' term='lab'/><title type='text'>Periplasm Prep Planning II</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.popgadget.net/2007/03/lamplamp_light.php"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 197px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SkrJrguBYyI/AAAAAAAAAKI/OgaXLue1L9w/s200/lamplamp.jpg" alt="" id="BLOGGER_PHOTO_ID_5353312856437777186" border="0" /&gt;&lt;/a&gt;I tried a modification of a periplasmic protein prep to try and purify uptake DNA, which didn't work.  There are several possible reasons why the experiment might not have worked, but one simple reason could be that I failed to dissociate DNA from the membranes and cells when I did the chloroform extraction.&lt;br /&gt;&lt;br /&gt;I know!  Maybe I should try an extraction that has already been used for purifying uptake DNA...&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&amp;amp;pubmedid=6316334"&gt;Kahn, Barany, and Smith (1983) PNAS 80:6927&lt;/a&gt;.  Rather than describe the paper at length here, I just want to show Table 1 and Figure 4b, which relate to my extraction plans:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/SkrJ6hO1RDI/AAAAAAAAAKQ/iAQnFDxPQXY/s1600-h/KahnTable1.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 360px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SkrJ6hO1RDI/AAAAAAAAAKQ/iAQnFDxPQXY/s400/KahnTable1.jpg" alt="" id="BLOGGER_PHOTO_ID_5353313114273432626" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;The first two columns describe the extraction conditions (rows 1-5).  Competent cell cultures were incubated with a radiolabeled plasmid and pelleted after DNA uptake (5 or 60 min).  Cell pellets were then resuspended in the indicated aqueous and organic solutions (columns A and B) in a 1:1 mixture, gently mixed, centrifuged to separate the phases, and radioactive counts in each fraction were measured.&lt;br /&gt;&lt;br /&gt;The remaining columns indicate the relative amount of uptake in the different fractions and the identity of the radiolabeled DNA, either transformed into the chromosome (C) or still a double-stranded donor DNA molecule (D).&lt;br /&gt;&lt;br /&gt;In the fourth condition (row 4), the aqueous phase consists of mostly donor DNA!  So chromosomal contamination is in the pellet, and the desired donor molecules are in the aqueous phase.  Sounds like a scheme.  That’s what I’ll proceed with tomorrow.&lt;br /&gt;&lt;br /&gt;Why not condition 1, just TE and phenol?  Looks good, right?  Because of Figure 4B:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/SkrKCw8V8mI/AAAAAAAAAKY/nBtymnD4Fgg/s1600-h/KahnFig4B.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 241px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/SkrKCw8V8mI/AAAAAAAAAKY/nBtymnD4Fgg/s400/KahnFig4B.jpg" alt="" id="BLOGGER_PHOTO_ID_5353313255929803362" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;The bottom line is that the phenol condition degraded the donor molecules (lane D), whereas the phenol/acetone condition did not (lane I).  Here’s the gory details:&lt;br /&gt;&lt;br /&gt;Lanes A-D describe phenol extraction of intact donor DNA molecules (Table 1, row 1):&lt;br /&gt;A: input plasmid donor molecule.&lt;br /&gt;B: total DNA from cells 4 min into uptake.&lt;br /&gt;C: DNA extracted and dialyzed out of the pellet.&lt;br /&gt;D: DNA extracted into the aqueous layer (TE).&lt;br /&gt;&lt;br /&gt;Lanes E-J describe the phenol/acetone extraction of intact donor DNA molecules:&lt;br /&gt;E: input again.&lt;br /&gt;F: input cut with &lt;span style="font-style:italic;"&gt;HindIII&lt;/span&gt;.&lt;br /&gt;G: total DNA after 8 min uptake.&lt;br /&gt;H: same as G, but digested with &lt;span style="font-style:italic;"&gt;HindIII&lt;/span&gt;.&lt;br /&gt;I: phenol/acetone extracted DNA after 8 mins.&lt;br /&gt;J: same as I, but digested with &lt;span style="font-style:italic;"&gt;HindIII&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;The point of the &lt;span style="font-style:italic;"&gt;HindIII&lt;/span&gt; digestion was as an additional test for whether molecules were donor or chromosomal.  Chromosomal DNA is resistant to &lt;span style="font-style:italic;"&gt;HindIII&lt;/span&gt; (this being &lt;span style="font-style:italic;"&gt;Haemophilus influenzae&lt;/span&gt; after all), while donor DNA is not.  Also, the &lt;span style="font-style:italic;"&gt;HindIII&lt;/span&gt; digests show that the recovered DNA is double-stranded, since ssDNA won’t get cut.&lt;br /&gt;&lt;br /&gt;Condition 4 it is, then:&lt;br /&gt;Aqueous: TE/1.5 M CsCl&lt;br /&gt;Organic: phenol/acetone, 1:1&lt;br /&gt;&lt;br /&gt;That's what I'll try next.&lt;br /&gt;&lt;br /&gt;Hmm, I’ll have to remember how to clean the DNA of CsCl after the extraction...I seem to remember doing something in particular at one point...&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-7547134680235548192?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/7547134680235548192/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/06/periplasm-prep-planning-ii.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/7547134680235548192'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/7547134680235548192'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/06/periplasm-prep-planning-ii.html' title='Periplasm Prep Planning II'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_7qRGxl6StM4/SkrJrguBYyI/AAAAAAAAAKI/OgaXLue1L9w/s72-c/lamplamp.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-521892019716124906</id><published>2009-06-30T08:32:00.000-07:00</published><updated>2009-06-30T17:00:24.874-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='plans'/><category scheme='http://www.blogger.com/atom/ns#' term='lab'/><title type='text'>Periplasm Prep</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://en.wikipedia.org/wiki/Chloroform"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 163px; height: 320px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/SkowGqtCWzI/AAAAAAAAAJg/A7oKwRvGi6M/s320/chloroform.gif" alt="" id="BLOGGER_PHOTO_ID_5353143998183791410" border="0" /&gt;&lt;/a&gt;UPDATED BELOW&lt;br /&gt;&lt;br /&gt;Experimental plan for the day:  Medium-scale periplasm prep test-- to purify double-stranded DNA in the “protected state” (the periplasm).&lt;br /&gt;&lt;br /&gt;I have two PCR products: (1) a “good” uptake sequence USS-1, and (2) a “bad” uptake sequence, USS-R .  I want to compare their uptake into &lt;span style="font-style: italic;"&gt;rec-2&lt;/span&gt; cells, which can bring DNA through the outer membrane, but not the inner membrane.  This means that if I can specifically enrich USS-1, but not USS-R--and can see the difference in a gel-- then I’ve got a functioning periplasmic DNA prep.  Naturally, it’ll probably take several attempts to get working...&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;I will use a modification of &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/7968534"&gt;this paper&lt;/a&gt; and see if it nets some DNA where it should be.  In outline, I’ll:  Add chloroform to washed cell pellets.  Soak.  Extract periplasm with TE.  Clean and concentrate.  Run on a gel.&lt;br /&gt;&lt;br /&gt;Based on &lt;a href="http://www.biomedcentral.com/1471-2148/6/82"&gt;other studies&lt;/a&gt; with radiolabeled USS-1 uptake, I expect that for 20 ng added to a 1 ml culture, ~50% will be taken up.  To see uptake DNA without radiolabel on a gel and for reasonable controls, I will need larger cultures of competent cells than I aliquoted and froze last week.&lt;br /&gt;&lt;br /&gt;Here’s my protocol so far:&lt;br /&gt;1) Defrost two tubes of  &lt;span style="font-style: italic;"&gt;rec-2&lt;/span&gt; (0.3 OD/ml aliquot) into fresh sBHI@37 (2X25 ml); wait ~2-2.5 hrs.&lt;br /&gt;2) At OD600=0.3 / ml, transfer cells to M-IV by filtration.&lt;br /&gt;3) Incubate 100 min @37 to induce natural competence. (negative control: frozen tube of &lt;span style="font-style: italic;"&gt;rec-2&lt;/span&gt; (0.1 OD/ml) into fresh sBHI (25 ml).)&lt;br /&gt;4) Split cultures 2X and incubate with 20ng 222bp PCR fragments (USS-1, USS-R, none) / 1 ml M-IV culture (~10^9 cells) for 15-30 min @37, DNase I, EDTA to kill DNase I and other nucleases. Also add USS-1 to non-competents.&lt;br /&gt;5) Spin, wash pellet 3X PBS, chloroform (20-40ul), incubate 20 min @RT (chloroform pellet DNA extraction?, save washes).&lt;br /&gt;6) Extract with 100-200ul cold TE, proteinase? RNase?, p/c extraction, PCR clean-up column (or ppt?) to concentrate.&lt;br /&gt;7) 1.2% agarose gel. Lanes:&lt;br /&gt;&lt;br /&gt;Size standard&lt;br /&gt;USS-1 input (2X dilution)&lt;br /&gt;USS-R input (2X dilution)&lt;br /&gt;rec-2 + USS-1 -&gt; chloroform extract&lt;br /&gt;rec-2 + USS-R -&gt; chloroform extract&lt;br /&gt;rec-2 + no dna -&gt; chloroform extract&lt;br /&gt;non-competent rec-2 + USS-1 -&gt; chloroform extract&lt;br /&gt;&lt;br /&gt;UPDATE:&lt;br /&gt;&lt;br /&gt;Didn't work.  A few little mishaps aside (mainly that the chloroform and cells really didn't mix well), I got no USS out of the prep, but did have a fair amount of chromosomal contamination.  So clearly, I didn't really get the periplasm specifically, but since I didn't see any USS come through, it may also be that my competent cells weren't really.&lt;br /&gt;&lt;br /&gt;I'll try to go through this again tomorrow, but instead of going straight for the periplasm preparation, I'll just lyse the cells, extract the DNA, run it over a mini-prep column, and run it on a gel.  I'm not going to worry about the periplasm specifically, but simply that the cells are taking up DNA.  When the radiolabel shows up, I can repeat the uptake assay the lab has typically done.&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-521892019716124906?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/521892019716124906/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/06/periplasm-prep.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/521892019716124906'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/521892019716124906'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/06/periplasm-prep.html' title='Periplasm Prep'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_7qRGxl6StM4/SkowGqtCWzI/AAAAAAAAAJg/A7oKwRvGi6M/s72-c/chloroform.gif' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-2490642549904806033</id><published>2009-06-26T11:41:00.001-07:00</published><updated>2009-06-26T11:53:02.929-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='plans'/><category scheme='http://www.blogger.com/atom/ns#' term='lab'/><title type='text'>Plan for the next few weeks</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7qRGxl6StM4/SkUWmZPw8_I/AAAAAAAAAJY/qDCdVWtMnS8/s1600-h/matchRdNP.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 417px; height: 87px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/SkUWmZPw8_I/AAAAAAAAAJY/qDCdVWtMnS8/s400/matchRdNP.jpg" alt="" id="BLOGGER_PHOTO_ID_5351708581067551730" border="0" /&gt;&lt;/a&gt;One proposal down, one to go...  The next one isn’t due until August 8th, so I’ve got just over a month to get it done.  This time, however, I am going to manage my time better, since I need to get some preliminary data and still keep learning how to use a computer.&lt;br /&gt;&lt;br /&gt;So here’s my plan for the next several weeks:&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;(1) Work on the proposal for a limited time each day (~1-2 hrs).  I’ll start by developing a detailed outline of what I want to say and the order I want to say it, rather than leaping straight into writing.  Based on my experience with this last one and in the past, I find I am an extremely inefficient writer (both with my time and with my words), so hopefully I can improve by having more focused daily goals.&lt;br /&gt;&lt;br /&gt;(2) Work on the computational stuff only 1-2 hrs per day.  Still need to fix the browser to display the Hin genome.  Still want to work out the best way to align the genomes and report differences... particularly enumerating structural variation (non-SNPs).  I also need to keep a mind towards what file formats I expect to get from sequencing.  It might be particularly useful to try and simulate the kinds of results I might expect from sequencing periplasmic uptake DNA, etc.&lt;br /&gt;&lt;br /&gt;(3) The rest of the time will be dedicated to lab work.  The priority is to use defined fragments (USS-1 and USS-R) to work out a periplasmic DNA purification protocol.  I’ve got cleaned amplified USS-1 and USS-R fragments, and I’m making competent cells of wild-type and rec-2 today.  I tried to grow up a pilA mutant to use as a no uptake control, but something was wonky with the strain.  All I need now is label.  And a calibrated Geiger counter.  I’ll get these things done today.&lt;br /&gt;&lt;br /&gt;(The image above was made using the Mac-specific application &lt;a href="http://www.ige.tohoku.ac.jp/joho/gmProject/gmhome.html"&gt;GenomeMatcher&lt;/a&gt;.  It represents a BLAST alignment between KW20 and 086-28NP across an interval containing several inversions.   It also has a bunch of useful-seeming utilities that I'd like to figure out.  Now if I could just get it to use my MUMmer program, like it’s supposed to...)&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-2490642549904806033?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/2490642549904806033/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/06/one-proposal-down-one-to-go.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/2490642549904806033'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/2490642549904806033'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/06/one-proposal-down-one-to-go.html' title='Plan for the next few weeks'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_7qRGxl6StM4/SkUWmZPw8_I/AAAAAAAAAJY/qDCdVWtMnS8/s72-c/matchRdNP.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-3444270827550327154</id><published>2009-06-25T16:34:00.000-07:00</published><updated>2009-06-25T16:39:26.003-07:00</updated><title type='text'>But which would win in a fight?</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/SkQKHV6YIBI/AAAAAAAAAJA/VbPrBJOk_oA/s1600-h/yeast.jpg"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 172px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SkQKHV6YIBI/AAAAAAAAAJA/VbPrBJOk_oA/s200/yeast.jpg" alt="" id="BLOGGER_PHOTO_ID_5351413378480349202" border="0" /&gt;&lt;/a&gt;The budding and fission yeasts provide for several interesting comparisons.  Some of these somewhat illustrate my considerable confusion about the “evolution of sex”.  Here’s one:&lt;br /&gt;&lt;br /&gt;While &lt;span style="font-style: italic;"&gt;Saccharomyces cerevisiae&lt;/span&gt; (the budding yeast) prefers to spend time as a diploid in the G1 phase of the cell cycle (with two unreplicated genomes), &lt;span style="font-style: italic;"&gt;Schizosaccharomyces pombe&lt;/span&gt; (the fission yeast) prefers being a G2 haploid (with one replicated genome).  These “preferences” are inferred from two pieces of evidence from each of the yeasts:&lt;br /&gt;&lt;br /&gt;(1) Budding yeast cells mate under rich conditions (become diploid), and sporulate under &lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/SkQKMrMfNVI/AAAAAAAAAJI/8YfKvBBwSqQ/s1600-h/pombe1.jpg"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer; width: 200px; height: 160px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SkQKMrMfNVI/AAAAAAAAAJI/8YfKvBBwSqQ/s200/pombe1.jpg" alt="" id="BLOGGER_PHOTO_ID_5351413470092801362" border="0" /&gt;&lt;/a&gt;starvation (become haploid).  Fission yeast cells remain haploids in rich conditions, while mating and immediately sporulating under starvation (zygotic meiosis).&lt;br /&gt;&lt;br /&gt;(2) In cycling cells, budding yeast spends most of the cycle in G1.  S-phase and mitosis are separated by only a short G2.  By contrast, fission yeast cells reside in G2 most of the time, with only a short gap between mitosis and S-phase.&lt;br /&gt;&lt;br /&gt;The interesting thing here is that, while the two yeasts prefer to maintain different ploidy levels (2N versus 1N), they both prefer to have two copies of the genome present (2C).  Why might this be?&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;One suggestion is that this means that there will always be a template for recombination in the event of a DNA double-stranded break (DSB).  DSBs are a particularly challenging form of DNA damage for cells.  They must paste back the correct broken ends, which requires an intact homologous (identical-by-descent) template to happen with high accuracy.&lt;br /&gt;&lt;br /&gt;Another possibility relates to newly introduced recessive mutations.  In both cases, a recessive mutation will not normally affect the phenotype of the cell it occurs in, since there are two copies of the genome.  But cell division has different consequences in diploid versus haploid cells.  In a diploid, mitosis will maintain the heterozygosity of the locus sustaining the new mutation, while in a haploid the wild-type and mutant alleles will immediately segregate in mitosis.  So natural selection will act differently on populations that are predominantly diploid or haploid.&lt;br /&gt;&lt;br /&gt;An important point regarding the G2 preference of fission yeast: Even a new lethal recessive mutation will not usually kill a particular cell.  A cell sustaining such a mutation in G2 segregates the lethal allele to only one of its progeny sister cells, maintaining the wild-type allele in the other.&lt;br /&gt;&lt;br /&gt;Diploid cells can still segregate the wild-type and mutant alleles, even in the absence of meiosis.  “Loss-of-heterozygosity” (LOH) can occur, if there is a crossover between the heterozygous locus and the centromere of homologous chromosomes.  Such crossovers, though rare, occur at measurable rates due to recombinational repair of DSBs and collapsed replication forks.  So even in the absence of proper sex, a diploid can still expose its new mutations to natural selection, just at a  substantially lower rate than haploids (over many cell generations only some LOH will occur in the budding yeast, but will happen immediately for new fission yeast mutations).&lt;br /&gt;&lt;br /&gt;This comparison illustrates contrasting life style choices.  While being haploid or diploid has distinct population genetic consequences, the preference of both yeasts to exist with two genomes in a cell says something interesting...  I’ll need to think a little more about what that something is before articulating it clearly...&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-3444270827550327154?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/3444270827550327154/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/06/but-which-would-win-in-fight.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/3444270827550327154'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/3444270827550327154'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/06/but-which-would-win-in-fight.html' title='But which would win in a fight?'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_7qRGxl6StM4/SkQKHV6YIBI/AAAAAAAAAJA/VbPrBJOk_oA/s72-c/yeast.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-3688680137690919349</id><published>2009-06-23T20:42:00.001-07:00</published><updated>2009-06-23T20:45:25.024-07:00</updated><title type='text'>Proposaling continues</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/SkGg4l7ynJI/AAAAAAAAAIw/bp_S3Nff4Eo/s1600-h/uptake.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 163px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SkGg4l7ynJI/AAAAAAAAAIw/bp_S3Nff4Eo/s400/uptake.png" alt="" id="BLOGGER_PHOTO_ID_5350734726408871058" border="0" /&gt;&lt;/a&gt;&lt;span class="fullpost"&gt;&lt;br /&gt;I continue on the saga of writing this soon-to-be-due proposal.  I keep struggling with things that I think I understand in my head, but can't succinctly talk about in the text.  Luckily, when I hit a wall, I can do some referencing and figure-making...&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-3688680137690919349?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/3688680137690919349/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/06/proposaling-continues.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/3688680137690919349'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/3688680137690919349'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/06/proposaling-continues.html' title='Proposaling continues'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_7qRGxl6StM4/SkGg4l7ynJI/AAAAAAAAAIw/bp_S3Nff4Eo/s72-c/uptake.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-1181706365364264031</id><published>2009-06-18T18:07:00.001-07:00</published><updated>2009-06-18T18:28:24.193-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='human USS'/><title type='text'>A taste of DNA</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.flickr.com/photos/40103951@N00/258234227"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 199px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SjrlDfcafTI/AAAAAAAAAIg/sScP3DT1HOs/s200/258234227_3ecc78778c.jpg" alt="" id="BLOGGER_PHOTO_ID_5348839355598929202" border="0" /&gt;&lt;/a&gt;I gave Rosie a draft of my grant application, so decided to dither with human USS motifs again.&lt;br /&gt;&lt;br /&gt;First, for fun, I looked at where several of them were located... to see which genes taste best... and ran across USS in all sorts of random genes.  (For example: a kinesin,  an adductin, a phosphatidic acid phosphatase, a phosphatidic acid kinase(!), a cadherin-associated protein, a few hypothetical genes and transcriptions factors, &lt;span style="font-style: italic;"&gt;etc. etc.&lt;/span&gt;)  There were also several located outside genes and in gene-poor regions.  It would be funny to do a &lt;a href="http://www.geneontology.org/GO.annotation.shtml"&gt;GO annotation&lt;/a&gt; analysis, but most certainly a waste of time...&lt;br /&gt;&lt;br /&gt;But to address Rosie’s comment:  I found 762 10mer USS motifs in the human genome using a Tagscan search.  This looks like it meets random expectations, since using Tagscan to count arbitrary 10mer motifs having similar base composition (and a CpG) gave similar numbers.&lt;br /&gt;&lt;br /&gt;However, my analytical calculation of expected seemed way off.  Where did I go wrong?  (One reason why I might actually care about this is so that I could do a more precise analytical calculation of the number off USS motifs I’d expect in the Haemophilus genome.)&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;I had calculated the chance of getting a USS motif  for a random 10mer as G* (25%)^10, for a 25% chance of drawing the correct base at each position (where G is genome size = 3.16e9 bases):&lt;br /&gt;&lt;br /&gt;So G * (25%)^10 = G * 1 / 1,048,576 = G * 9.54 e -7 = 2,956 instances.&lt;br /&gt;&lt;br /&gt;But since the human genome has only 41% GC content, I might adjust this measurement to be (20.5%)^5 * (29.5%)^5, for a 20.5% chance of drawing the correct base at GC positions and a 29.5% chance of drawing the correct base at AT positions:&lt;br /&gt;&lt;br /&gt;So G * (20.5%)^5*(29.5%)^5 = G * 8.09 e -7 = 2,556 instances&lt;br /&gt;&lt;br /&gt;These are for only a single strand of DNA.  Since there’s also the reverse complement, we do have to multiply these values by 2.  I can’t think why this wouldn’t be.  So...&lt;br /&gt;&lt;br /&gt;50% GC : 5922&lt;br /&gt;41% GC : 5112&lt;br /&gt;&lt;br /&gt;So even accounting for %GC failed to bring this calculation down to what Tagscan found.&lt;br /&gt;&lt;br /&gt;What about dinucleotide composition?&lt;br /&gt;&lt;br /&gt;It’s well known that mammalian genomes have a dearth of CpG dincucleotides, since these are used as sites of gene regulation by cytosine DNA methylation.  Methylated cytosines tend to deaminate into thymidines, so there is a mutational pressure on CpG to go to TpG dinucleotides.&lt;br /&gt;&lt;br /&gt;Anyways, I managed to find &lt;a href="http://www.sciencedirect.com/science?_ob=MiamiCaptionURL&amp;amp;_method=retrieve&amp;amp;_udi=B6T39-4C7DGDJ-4&amp;amp;_image=tbl1&amp;amp;_ba=&amp;amp;_user=1022551&amp;amp;_rdoc=1&amp;amp;_fmt=full&amp;amp;_orig=search&amp;amp;_cdi=4941&amp;amp;view=c&amp;amp;_isTablePopup=Y&amp;amp;_acct=C000050484&amp;amp;_version=1&amp;amp;_urlVersion=0&amp;amp;_userid=1022551&amp;amp;md5=7070d9fbcd968e686e7328f3ed32f013"&gt;a nice table&lt;/a&gt; in &lt;a href="http://www.sciencedirect.com/science?_ob=ArticleURL&amp;amp;_udi=B6T39-4C7DGDJ-4&amp;amp;_user=1022551&amp;amp;_rdoc=1&amp;amp;_fmt=&amp;amp;_orig=search&amp;amp;_sort=d&amp;amp;view=c&amp;amp;_acct=C000050484&amp;amp;_version=1&amp;amp;_urlVersion=0&amp;amp;_userid=1022551&amp;amp;md5=91665498530acf22e4c7227b4128b10d"&gt;this paper&lt;/a&gt; reporting dinucleotide composition in several genomes.  The paper the table’s authors used for humans was dated from 1962, so these are probably not particularly precise numbers but sufficient for my purposes.  It looks like the CpG dinucleotide is ~4-fold less than would be expected for a random genome with base composition like humans’.&lt;br /&gt;&lt;br /&gt;This paucity of CpG dinucleotides in the human genome could account for the discrepency.  So just for giggles, I ran a few additional Tagscans for 10mers with the correct base composition but lacking a CpG dinucleotide, giving the following numbers: 9143, 6698, and 4343 motifs.  Those numbers look a lot more on-target (not terribly precise, but more accurate).&lt;br /&gt;&lt;br /&gt;But how can I actually use the known distribution of dinucleotides in the human genome to arrive at an even more accurate estimate of how many USS motifs I’d expect to see?  I thought up some crummy ways to sorta-kinda account for the CpG deficit, but would ideally like to be able to use arbitrary dinucleotide frequencies and arbitrary %GC to produce an expected value.  I have a terrible feeling I need to use Markov or Ising models and just don’t have the heart for it right now...&lt;br /&gt;----&lt;br /&gt;As an aside, UNIX is pretty awesome.  To figure out the number of motifs Tagscan was finding (their output is a list of matches), I simply typed:&lt;br /&gt;&lt;blockquote&gt;&gt; wc -l filename&lt;/blockquote&gt;&lt;br /&gt;And that gave me the number of lines in the file.  The first row was the header, and the last row was blank, so subtracting 2 from the number gave me the total number of motifs.  &lt;br /&gt;&lt;br /&gt;If I had Tagscan (and the human genome) on my computer, I could fairly easily set up a script to iterate through a whole bunch of 10mers with specified parameters and draw up a distribution.  If I was really cool, I'd exclusively use UNIX commands to do this.  This would then allow me to ask what the significance of the number of USSs would be.  (Obviously p &gt; 0.05, but that would be one way to do a real statistical test, even if I never figured out how to work out the expected value by an analytical method.)&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-1181706365364264031?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/1181706365364264031/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/06/taste-of-dna.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/1181706365364264031'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/1181706365364264031'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/06/taste-of-dna.html' title='A taste of DNA'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_7qRGxl6StM4/SjrlDfcafTI/AAAAAAAAAIg/sScP3DT1HOs/s72-c/258234227_3ecc78778c.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-5827321818251068768</id><published>2009-06-16T21:58:00.000-07:00</published><updated>2009-06-17T12:49:32.257-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='USS'/><category scheme='http://www.blogger.com/atom/ns#' term='human USS'/><category scheme='http://www.blogger.com/atom/ns#' term='pattern matching'/><title type='text'>They're eating our DNA!!!</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://mongreldesigns.com/blog/cartooning/"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 179px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SjiCuxiVSLI/AAAAAAAAAIQ/Gb84kyzFv2I/s200/081.jpg" alt="" id="BLOGGER_PHOTO_ID_5348168297585395890" border="0" /&gt;&lt;/a&gt;I am in the midst of proposal-writing, which makes blogging tougher, but when I hit a writing-block at the end of the day, I decided to dally with a random thing I'd been meaning to figure out.&lt;br /&gt;&lt;br /&gt;Several friends of mine, when I tell them about my plans with the naturally competent bacteria, have said, "Dude, you should feed them human DNA!"&lt;br /&gt;&lt;br /&gt;Sound silly?  Maybe it is, but I went ahead and used &lt;a href="http://www.isrec.isb-sib.ch/tagger/tagscan.html"&gt;TAGSCAN&lt;/a&gt; to search the human genome for the 10-bp core USS motif:  AAAGTGCGGT and found 762 instances of the USS core.  &lt;a href="http://www.ncbi.nlm.nih.gov/genome/seq/BlastGen/BlastGen.cgi?taxid=9606"&gt;BLAST&lt;/a&gt; and &lt;a href="http://genome.ucsc.edu/cgi-bin/hgBlat?command=start"&gt;BLAT&lt;/a&gt; didn't want to deal with me due to how short the query USS motif is.&lt;br /&gt;&lt;br /&gt;That means &lt;span style="font-style: italic;"&gt;Haemophilus influenzae&lt;/span&gt; might find nearly 800 bits of our genomes tasty! I remember from my reading that there's nearly 200 micrograms of DNA per milliliter of our lung mucus, which seems like a heck of a lot.  Much of this DNA is probably human... &lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;(For context, on average &lt;span style="font-style:italic;"&gt;Haemophilus&lt;/span&gt; have a USS less than every 2 kilobases, but humans have an average USS density lower than one in 20 megabases.  So while humans have half as many USSs, there's a 10,000 times lower density.)&lt;br /&gt;&lt;br /&gt;It isn't surprising the human genome contains at least some "USS".  A random 10mer string would have a 1/(4^10), or a little less than 1 / 1,000,000, chance of being the USS motif.  But the human genome is more than 3 billion base pairs.  So randomly, we might expect to find more than 6000 USS (3 billion / 1 million X 2). (The 2 is to also count the reverse complement.)&lt;br /&gt;&lt;br /&gt;If I did that right, then the observed number of USS motifs in the human genome is rather less than expected.  That's interesting...&lt;br /&gt;&lt;br /&gt;However, that was a crude estimate of the expected number of USS motifs.  The USS core sequence has 5 GC bases and 5 AT bases, giving a GC base composition of 50%, whereas the human genome has only 41% GC content.  There's also the issue of dinucleotide frequencies.  For example, the CpG dinucleotide is underrepresented in the human genome, but happens to appear in the USS.  I've been trying to figure out a rational way to incorporate this type of information into my estimate of expected, but so far have failed to do so properly.  Regardless, my estimate of expected is certainly too high.  The question is: how much so?&lt;br /&gt;&lt;br /&gt;To produce control numbers, maybe tomorrow I'll run a few other TAGSCANs for arbitrary 10mers with the same GC content and a CpG and see how many come up.  I haven't really looked at the distribution of USS, except that the number of USS per chromosome is highly correlated to chromosome size (R^2 = 0.91).  I might also predict that USS will fall into more GC-rich regions of the genome.&lt;br /&gt;&lt;br /&gt;But for fun, assuming that the result held, what might it mean if the USS motif is significantly underrepresented in the human genome?  I can hardly imagine that &lt;span style="font-style: italic;"&gt;Haemophilus&lt;/span&gt; could be responsible ("it ate them!!!"), but maybe it could work in the other direction?  Perhaps the USS motif is only coincidentally somewhat rare in humans, and as such makes a good sequence to use for preferring conspecific DNA uptake?  If the USS motif was extremely abundant  in the human genome, then naturally competent &lt;span style="font-style: italic;"&gt;Haemophilus&lt;/span&gt; might not take up conspecific DNA as easily?  Hmmm...&lt;br /&gt;&lt;br /&gt;Regardless, think about that when you fall asleep, folks...  There's bacteria inside you, and THEY'RE EATING YOUR DNA...&lt;br /&gt;&lt;br /&gt;UPDATE:&lt;br /&gt;&lt;br /&gt;Putting my proposal together continues, but in a bout of procrastination I did go ahead and run a few random 10mers with the same base composition (and a CG dinucleotide) through TAGSCAN, and it looks like the observed number of USS motifs is about expected for 10mers of similar composition.&lt;br /&gt;&lt;br /&gt;The USS motif:&lt;br /&gt;AAAGTGCGGT  762&lt;br /&gt;&lt;br /&gt;Randomized USS motifs:&lt;br /&gt;GTACGTAAGG  414&lt;br /&gt;CGAGAGAGTT  761&lt;br /&gt;GAATACGTGG  1112&lt;br /&gt;AGATAGCGTG  714&lt;br /&gt;GTAACGAGTG  458&lt;br /&gt;AGACGTTAGG  713&lt;br /&gt;&lt;br /&gt;Nothing to see here... Move along...&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-5827321818251068768?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/5827321818251068768/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/06/eating-our-dna.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/5827321818251068768'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/5827321818251068768'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/06/eating-our-dna.html' title='They&apos;re eating our DNA!!!'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_7qRGxl6StM4/SjiCuxiVSLI/AAAAAAAAAIQ/Gb84kyzFv2I/s72-c/081.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-1750647576899501769</id><published>2009-06-09T22:19:00.000-07:00</published><updated>2009-06-10T01:24:06.899-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='computers'/><category scheme='http://www.blogger.com/atom/ns#' term='alignment'/><category scheme='http://www.blogger.com/atom/ns#' term='supragenome'/><title type='text'>Supragenomicisticexpialidocious</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7qRGxl6StM4/Si9Es-9lboI/AAAAAAAAAIA/NnXGUFSBECs/s1600-h/clusters4.jpg"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 84px; height: 400px;" src="http://2.bp.blogspot.com/_7qRGxl6StM4/Si9Es-9lboI/AAAAAAAAAIA/NnXGUFSBECs/s400/clusters4.jpg" alt="" id="BLOGGER_PHOTO_ID_5345566822318304898" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Howzabout that “&lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed&amp;amp;term=supragenome"&gt;supragenome&lt;/a&gt;”?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;Polymorphic gene content&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Genome sequencing efforts have captured substantial variation in the gene content among closely related bacteria.  For example, &lt;a href="http://genomebiology.com/2007/8/6/R103"&gt;this nice study by Hogg &lt;span style="font-style: italic;"&gt;et al.&lt;/span&gt; 2007&lt;/a&gt; compared the genome sequences of thirteen &lt;span style="font-style: italic;"&gt;Haemophilus influenzae&lt;/span&gt;: They found a “core genome” of ~1500 genes, along with another accessory (or contingency) genome of ~1300 genes.  Any given isolate had a subset of the accessory genome, numbering around a few hundred extra genes beyond the core in each isolate.&lt;br /&gt;&lt;br /&gt;So any two isolates have substantial amounts of DNA that is unshared.  For example, Hogg &lt;span style="font-style: italic;"&gt;et al.&lt;/span&gt; report that the Kw20/Rd and 86-028NP isolates differ by nearly 400,000 bp within only ~250 indels (Table 5-- mean size: ~1.5 kb, median size: ~300 bp).   The genomes are &lt; 2 Megabases long, so that’s about 20% of these chromosomes that is non-homologous.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;(As an aside, the Hogg &lt;span style="font-style: italic;"&gt;et al.&lt;/span&gt; paper’s methods section introduced me to MUMmer, which I &lt;a href="http://nodnacontrol.blogspot.com/2009/05/enumerating-their-differences.html"&gt;previously discussed&lt;/a&gt;.  The indels and other rearrangments are essentially defined by breaks in the alignment produced by the nucmer utility.  More on this in the future...)&lt;br /&gt;&lt;br /&gt;There are several possible arguments related to uptake specificity and the "supragenome":&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;&lt;br /&gt;Uptake specificity for variation?&lt;/span&gt; &lt;br /&gt;&lt;br /&gt;Large variation in orthologous gene content suggests to Hogg &lt;span style="font-style:italic;"&gt;et al.&lt;/span&gt; and others a “distributed genome hypothesis”, in which natural transformation can shuffle (or re-assort, or segregate) the accessory genes between isolates.  This would then presumably allow for the rapid acquisition and loss of different genes (diversification) from within a given genetic background and thus perhaps rapid adaptation to environmental changes (or shifting host defenses).  The “distributed genome hypothesis” is then implicitly related to the “sex hypothesis” for the maintenance of natural competence.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;Uptake specificity for conservation?&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;On the other hand, natural transformation could also maintain the “core genome”.  Thus, if there is plenty of conspecific DNA uptake, any bit of “core genome” taken up could be replaced in a cell that had lost it.  &lt;a href="http://genomebiology.com/2008/9/3/R60"&gt;This nice study by Treangen &lt;span style="font-style: italic;"&gt;et al.&lt;/span&gt; 2008&lt;/a&gt; using several neisserial genome sequences showed that DNA uptake sequences (DUS, the neisserial equivalent of USS) existed at a higher density in “core” regions of the genome than in the substantial alignment gaps between isolates (containing indel poymorphism).&lt;br /&gt;&lt;br /&gt;Again, there is some indication of the “sex hypothesis” for the maintenance of natural competence, but it works in the opposite direction, maintaining the core rather than shuffling the accessory.  I think the argument goes like this: (1)  The core genome likely defines the more essential portions of the genome, since by definition any accessory genes are not required to live.  (2) DUS could have been selected for within this partition of the genome, since it would help to maintain the more essential gene functions within a population. (3) Therefore the high number of DUS sequences could be a product of natural selection to maintain the integrity of the “core genome”.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;Uptake specificity for no reason in particular?&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;However, there’s another possibility the authors partially explore that does not involve selection for DUS distributed throughout the genome, but represents almost the opposite model.  Instead of selection, pehaps DUS accumulate due to happenstance intrinsic biases in the uptake and/or recombination machinery by a neutral molecular drive.  So sequence variants that arise with a higher chance of being taken up later are more likely to spread through populations than variants with a lower chance of uptake.  Thus the “core genome” could partially be that way, i.e. conserved across isolates--not exclusively because of essentiality or usefulness--but also by virtue of containing lots of DUS.  So rather than DUS being selected for in order to maintain the core genome, segments of DNA containing DUS are simply mre easily replaced in lineages that lost them.&lt;br /&gt;&lt;br /&gt;An affiliated idea suggests that if some accessory genes were from distant relatives and arrived by horizontal transfer by some mechanism besides natural competence, these sequences would not have had time to accumulate uptake sequences yet.  Thus the paucity of DUS in the accessory genome might be in part due to the more recent arrival of that sequence in the genome, so the effects of drive have not yet become evident, rather than a specific selection pressure to maintain DUS in more important segments of the genome.&lt;br /&gt;&lt;br /&gt;(The Treangen &lt;span style="font-style: italic;"&gt;et al. &lt;/span&gt;paper introduced me to another genome alignment tool called &lt;a href="http://alggen.lsi.upc.es/recerca/align/mgcat/intro-mgcat.html"&gt;M-GCAT&lt;/a&gt;.  I’ve played with it a bit and managed to produce some figures effectively the same as what appears in their supplementary data-- the picture above (along with alignment files resembling multi-FASTA format) but have the unfortunate problem of being unable to re-load analyses I’ve performed later due to some kind of Python error.  More in this later as well...)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;How to analyze the core and accessory genomes myself?&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I’ve clearly got a lot more thinking to do regarding these core and accessory genomes...  Especially in light of the horizontal gene transfer issue.&lt;br /&gt;&lt;br /&gt;But first I’d better figure out simply how to define the core and accessory genomes more specifically.&lt;br /&gt;&lt;br /&gt;I’ve begun this  by examining the gaps in the .rdiff and .qdiff output of dnadiff (a pairwise comparison of two genomes) to try and do some basic analysis myself.  In a future post, I’ll report on my progress with this, but for now, I’ll just mention that most of the gaps are not strictly insertions or deletions, but are rather insertional deletions.  Alignment gaps include both reference and query bases.  But I still need to try and understand how dnadiff produced its .report output before I can get much further...&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-1750647576899501769?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/1750647576899501769/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/06/supragenomicisticexpialidocious.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/1750647576899501769'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/1750647576899501769'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/06/supragenomicisticexpialidocious.html' title='Supragenomicisticexpialidocious'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_7qRGxl6StM4/Si9Es-9lboI/AAAAAAAAAIA/NnXGUFSBECs/s72-c/clusters4.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-1746124736029279663</id><published>2009-06-05T15:20:00.001-07:00</published><updated>2009-06-05T15:40:04.373-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rearrangements'/><category scheme='http://www.blogger.com/atom/ns#' term='plans'/><title type='text'>Joint molecules...</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7qRGxl6StM4/SimaNexj2aI/AAAAAAAAAHw/kg4PTvkUvxM/s1600-h/transf-types.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 278px; height: 274px;" src="http://4.bp.blogspot.com/_7qRGxl6StM4/SimaNexj2aI/AAAAAAAAAHw/kg4PTvkUvxM/s400/transf-types.png" alt="" id="BLOGGER_PHOTO_ID_5343971989241715106" border="0" /&gt;&lt;/a&gt;There are several classes of transformation event that could be mediated by natural competence and recombination.  The first thing that I had to wrap my head around (after having come from the double-stranded break world of recombination) was to realize that uptake DNA recombining into a host chromosome is single-stranded.  I’m more used to drawing recombination models involving two broken ends of a double-stranded molecule.&lt;br /&gt;&lt;br /&gt;So to kick things off, in the above figure, I’ve drawn what I think the joint molecule intermediates look like between polymorphic donor ssDNA invaders and recipient dsDNA substrates for the four major classes of transformation products I can envision.&lt;br /&gt;&lt;br /&gt;The red (red/pink) lines indicate an ssDNA that’s found its way from outside the cell to the host chromosome (in blue/light blue).  Aligned regions are base-paired.  Unaligned regions are not... &lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;The amount of transformation will depend not only on how many of a given joint molecule are formed, but also how they are resolved and the rate and directionality of “mismatch” correction.&lt;br /&gt;&lt;br /&gt;One important aspect of the strand invasions depicted above will be the extent of heterology.  The more polymorphisms in the donor sequence, the less stable the joint molecule will be.&lt;br /&gt;&lt;br /&gt;Each class of polymorphism deserves special mention:&lt;br /&gt;&lt;br /&gt;(1) Single nucleotide polymophisms:  There are 12 possible heteroduplexes involving single-nucleotides, if we keep track of donor and recipient.  So any of the 4 bases could go to any of the remaining 3... 3 X 4 = 12.  In my preliminary analysis, I found nearly 39,029 SNPs between our reference Rd strain and the clinical isolate 86-028NP falling into each one of these different classes (with a paucity of G-&gt;C and C-&gt;G changes).  Even without precisely measuring the frequency of transformation for each individual SNP, we may be able to estimate the relative transformation frequencies for these different classes of SNPs.&lt;br /&gt;&lt;br /&gt;(2) Insertions:  If the donor has an insertion relative to the recipient, recombination could yield an insertion.  In the case of insertions, the uptake sequence can be anywhere on the fragment.  base pairing of the substrate will require the formation of a loop.  The size distribution of the input DNA will also dictate the size of possible insertional recombination.  How the mismatch correction machinery handles insertional joint molecules like this is unclear to me.  My preliminary analysis indicated 149 insertions in 86-028NP relative to Rd (mean size = 1162 bp, but the median only a few hundred).&lt;br /&gt;&lt;br /&gt;(3) Deletions:  I made this distinct from insertions for two reasons: (1) The uptake sequence (obviously) must flank the deleted segment.  An absent piece of DNA can’t be used for uptake.  This imposes directionality on deletions that’s distinct from insertions.  (2) It’s unclear how the mismatch machinery would act on the two central joint molecules depicted...  It restoration repair more likely in one case than in the other?  My preliminary analysis indicates 137 deletions in 86-028NP relative to Rd (mean size 1904 bp, again with a much lower median).&lt;br /&gt;&lt;br /&gt;(4) Rearrangements:  I’ll give this very short shrift for now.  If a particular piece of ssDNA spans a rearrangement breakpoint, then strand invasion of each end into two separate chromosomal positions could mediate an inversion or several other kinds of rearrangement.  My preliminary analysis with MUMmer suggests ~9 inversions and 26 transpositions between the two strains.&lt;br /&gt;&lt;br /&gt;There are many many ways in which rearrangements in the donor might produce fragments that could mediate a rearrangment of the recipient.  It’s going to take some time to enumerate these.  Indeed, even if the donor was perfectly colinear with the recipient, some fragments could still mediate rearrangements.&lt;br /&gt;&lt;br /&gt;For example, looking at a close-up of the MAUVE output between Rd and 86-028NP shows that the rRNA gene cluster of 23S and 16S (in Red) often span the rearrangement breakpoints between the two strains:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7qRGxl6StM4/SimacG8_WEI/AAAAAAAAAH4/inGsiHvjmfc/s1600-h/rrnaBREAKS.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 170px;" src="http://3.bp.blogspot.com/_7qRGxl6StM4/SimacG8_WEI/AAAAAAAAAH4/inGsiHvjmfc/s400/rrnaBREAKS.jpg" alt="" id="BLOGGER_PHOTO_ID_5343972240545241154" border="0" /&gt;&lt;/a&gt;This could arise in a traditional mutational way (i.e. recombination between inverted repeats would cause an inversion), but could also be mediated by transformation.  For example, if a DNA fragment that contained a bit of rDNA and a bit of flank first invaded the wrong rDNA copy, the other end could then grab the original flank... causing a rearrangement.&lt;br /&gt;&lt;br /&gt;That is, in the rearrangements-involving-repeats scenario, it would be quite possible to produce rearrangements distinct from either the donor or recipient DNA.&lt;br /&gt;&lt;br /&gt;This is going to take a while to flesh out, as there are several possible outcomes to this sort of thing.  It doesn't make things any easier that I also now have to consider that the chromosome is circular...  sigh... I'd better try and remember what plectonemic and paranemic mean...&lt;br /&gt;&lt;br /&gt;Nevertheless, the upside is that it will be possible to measure indel and rearrangement transformation rates with considerably more ease than the SNP class, due to the use of “spanning coverage”.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3017697080484068141-1746124736029279663?l=nodnacontrol.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nodnacontrol.blogspot.com/feeds/1746124736029279663/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://nodnacontrol.blogspot.com/2009/06/joint-molecules.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/1746124736029279663'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3017697080484068141/posts/default/1746124736029279663'/><link rel='alternate' type='text/html' href='http://nodnacontrol.blogspot.com/2009/06/joint-molecules.html' title='Joint molecules...'/><author><name>Chang</name><uri>http://www.blogger.com/profile/12291718994939895064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='28' src='http://1.bp.blogspot.com/_7qRGxl6StM4/SfeXyDzb2eI/AAAAAAAAAAg/VwaH4NSOK1s/S220/boy-tremoctopus.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_7qRGxl6StM4/SimaNexj2aI/AAAAAAAAAHw/kg4PTvkUvxM/s72-c/transf-types.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3017697080484068141.post-5134588000615816995</id><published>2009-06-05T13:09:00.000-07:00</published><updated>2009-06-05T13:33:03.678-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='plans'/><category scheme='http://www.blogger.com/atom/ns#' term='grants'/><title type='text'>Searching for Cash (How I'll spend my summer vacation)</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://images.google.ca/images?q=grants"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 142px;" src="http://1.bp.blogspot.com/_7qRGxl6StM4/Sil9LmCvLnI/AAAAAAAAAHo/vz5kMdTMEEM/s200/grant-money.jpg" alt="" id="BLOGGER_PHOTO_ID_5343940070995865202" border="0" /&gt;&lt;/a&gt;So it’s time to outline my plans for the next several weeks...  Grant-writing and preliminary data gathering!&lt;br /&gt;&lt;br /&gt;Two grant applications to do:&lt;br /&gt;(1) &lt;a href="http://www.msfhr.org/funding/individual_awards/research_trainee"&gt;Michael Smith Foundation for Health Research&lt;/a&gt; (due June 25 to Office of Research)&lt;br /&gt;(2) &lt;a href="http://grants.nih.gov/training/nrsa.htm"&gt;National Institutes of Health&lt;/a&gt; (due August 8)&lt;br /&gt;&lt;br /&gt;I’ll find out whether or not to submit a full application to (1) in the next several days (if my “letter of intent” was sufficiently cool-sounding, I guess).  The proposal itself is short (only 3 pages), so I should be able to focus on having a well-written piece in the next couple of weeks.  This will help me a lot with the other proposal too.  (I’d also better solicit letters for that one soon, but I want to wait until they tell me to apply first!)&lt;br /&gt;&lt;br /&gt;And for (2) much editing, writing, and analysis to do!  Rosie and I had begun tackling my rejected NIH proposal after I first got here, but enough time has passed that we’re both out-of-the-loop on our own editing.  There are several things to do, besides polish the writing:&lt;span class="fullpost"&gt;&lt;br /&gt;&lt;br /&gt;First off, the basics of the proposal are these:  (a) I want to feed genomic DNA from one &lt;span style="font-style: italic;"&gt;Haemophilus influenzae&lt;/span&gt; strain to a competent cell culture of another strain.  (b) I’ll purify the donor DNA from various cellular compartments (representing different stages in the natural transformation pathway).  (c) Then, I’ll use sequencing to measure the relative abundance of different DNA sequences along the pathway.  (d) This should give me a comprehensive view of transformation potential of one genome into another.&lt;br /&gt;&lt;br /&gt;There are a huge number of possible analyses, each of which may or may not be interesting.  One major issue is how to present the kinds of analysis I’d plan on doing (and showing the reviewers that I can do it).  And how to show that it really is an important set of experiments that I am capable of doing.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Proposal writing&lt;/span&gt;:  Alongside simply improving the writing, I need to specify more details on how I will conduct the data analysis and provide any form of preliminary data that I can.  These will also help us with the larger DNA uptake proposals we plan to submit in the Fall.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Background&lt;/span&gt;:  I need to make the importance of the research plan and the specific questions I’ll answer much more clearly written and accessible.  I also need to better understand the history of uptake signal sequences and how they were discovered.  I.e. what’s already known versus what I’m going to learn.  Rosie and I have talked this section through pretty well.  It just needs to get re-written now.&lt;br /&gt;&lt;b
