
Number 6...
...in the "Top 10 New Species of 2009"
Number 10 is pretty mind-boggling...
(continued...)
Repository for my contemplations of transmission genetics and natural transformation in Haemophilus infuenzae, as well as notes on becoming a hack bioinformaticist
> wc -l K12s11.snp O57s11.snpIndeed, there are 10X more SNPs running s11 against O57 than against K12. The several thousand SNPs called against K12 are likely mostly errors. Maq doesn’t always assign the consensus base with A, C, G, or T, but with any of the other IUPAC nucleotide codes, so many of these “SNPs” are probably quite low confidence.
6051 K12s11.snp
64217 O57s11.snp
> cut -f 2 -f 3 -f 4 O57s11.snp > maqSnp.txtThis provided me with my two sets, one called maqSnp.txt and the other mumSnp.txt. Here’s their lengths:
> cut -f 1 -f 2 -f 3 O57vsK12.snps > mumSnp.txt
> wc -l maqSnp.txt mumSnp.txtNotably, Mummer called many more SNPs than Maq. I think this is largely because the SNP output from Mummer includes single-nucleotide indels, which Maq misses since is does an ungapped alignment. I’m not sure how to deal with this, but in our real experiments, they should still be discoverable, since we’ll map our reads to both donor and recipient genomes. Also, there are numerous “SNPs” in the Maq file that are non-A, C, G, T consensus bases, which will largely be absent from the Mummer comparison.
64217 maqSnp.txt
76111 mumSnp.txt
> grep -xF -f maqSnp.txt mumSnp.txt | wc -lSo:
57179