
One of our experiments involves measuring the specificity of DNA uptake by naturally competent H. influenzae for fragments containing “the genomic USS motif”. The H. influenzae genome contains an abundant sequence motif, and fragments bearing it are taken up better than fragments that don’t. This “uptake signal sequence” was originally defined by its functional role in DNA uptake, but has since been characterized mostly by bioinformatics, with no direct uptake specificity data. The limited data from previous lab members suggests only an imperfect correspondence between the properties of the genomic motif and the specificity of DNA uptake.
The idea, then, is to feed competent cells small DNA fragments bearing a degenerate (highly mutated) version of the USS consensus sequence, recover those that are preferentially taken up, and sequence the resulting pool. USSs are ~32 bases, well within the reach of single-end Illumina reads, if they are positioned properly next to a sequencing primer.
I’ve previously discussed the expected properties of a degenerate USS pool. And though I think we need to consider this more, I will focus this post on the design of other parts of the construct that will allow us to circumvent subsequent sequencing library construction steps. Illumina sequencing uses specific sequences added to the ends of molecules to capture and sequence DNA of interest...
Properties needed for a USS-containing construct, where Illumina sequencing can be directly performed to sequence the USS:
(1) SIZE: ≥200 base pairs, dsDNA. 200 base fragments with USS are efficiently taken up by cells, and the size is sufficient for efficient cluster synthesis and sequencing using Illumina’s Genetic Analyzer.
(2) CAPTURE SEQUENCES: One end of a strand of each fragment needs to be able to anneal to one of the two “Flow Cell Primers” (FP) in the Illumina flow cell, while the other end of the same molecule needs to contain the reverse complement of the other FP.
(3) SEQUENCING PRIMER BINDING SITE: The reverse complement of Illumina’s sequencing primer needs to be immediately downstream of the reverse complement of the USS. (This could work the other way, but getting the “sense” USS directly from the sequencing reads seems optimal).
(4) TAG SEQUENCE: The first few (four) bases of each read should be in non-degenerate fixed sequence to facilitate the alignment of the degenerate USS reads.
(5) CONSTRUCTION: After consulting several oligo makers, we learned that we wouldn’t be able to get our degenerate constructs built into an oligo longer than 130 nt. This means that I will need to anneal two oligos together and extend with polymerase to generate a full-length construct.
The first trick was to actually find out what the normal Illumina adapter and primer sequences were. They were available on-line, and I think I’ve mostly reverse-engineered what the different bits do. And think I have a reasonable design:
I’ll order two oligos, one 130 nt and the other 106 nt. (At the end of this post, I will list the exact sequences of each part and some notes.) They’ll have 36 bp of reverse complementarity at their 3’-ends, so that I can anneal them and extend to produce full-length construct.


To sequence the 200mer (either before or after recovery from competent cell periplasms), the DNA would be melted and annealed to an Illumina flow cell. Below are shown two different parts of a flow cell surface, where the two different strands of a single molecule might anneal.




NEXT UP: Uh oh… What about yields? Dimensional analysis…
APPENDIX:
The different parts of the two oligos:

Notes on my reverse engineering:
- FP1 (25 nt): Composed of putative 20mer FP1 + first 5 bases of one adaptor (calling it A)
- SP1 (33 nt): Sequencing primer for single-end Illumina runs. Includes the 13 bases of the normal adaptor that normally results in a 13 bp inverted repeat palindrome on either side of adapted DNA fragments.
- USS (36 nt): Includes 4-base tag (ATGC) upstream of a 32-base genomic Gibbs consensus sequence with a set level of degeneracy at each position.
- G1 (36 nt): Additional sequence from pGEM7f ,corresponding to the portion of the spacer region where the two oligos are intended to anneal.
- G2 (46 nt): More sequence from pGEM7f, corresponding to the spacer region only on one of the two oligo.
- FP2’ (23 nt): Composed of the complement to the 20mer FP2 + first 3 bases of the other adaptor (calling it B).
- Total length after annealing and extension is 200 bases, where the USS is located from position 63 (after the spacer) to position 94. In the flow cell, the use of SP1 as a sequencing primer should read the complement of the USS sequence, so the actual sequence obtained will correspond to USS (with the first four bases always ATGC).
Good tip, so this method allows you to see when the new grease is starting to ooze across the top of the bearing?
ReplyDeletehydrodynamic bearings
Its fabulous
ReplyDeleteVery informative Reverse Engineering in India
ReplyDelete