Wednesday, June 3, 2009

There's a lesson here somewhere

A recent email exchange with a local computer expert, explaining one reason I normally avoid directly seeking help until I really need it (this also applies to statisticians):
Me (paraphrased): Er... help... with things...
Me (a little later): It worked like a charm! Soon after I asked you for help, I worked it out... Figures.
Local Expert (soon thereafter): Glad to be of service!
Nevertheless this method is tried and true! Thanks, Alistair!

Here's another tool I recently checked out...

CGView made a rather nice figure for default settings: (from out to in) genes, %GC base composition, GC skew. There seems to be several other ways to further configure the file. In particular, I was interested in the plot of GC skew. Changes in sign (between purple and green) can indicate origins and termini of replication, due to mutation rate differences between the leading and lagging strands of replication. It's not immediately obvious what position should be the starting coordinate for a particular genome, but for PittEE, I suppose I'd guess the origin as the hour hand a little before 6:30...


Details on how I made that:
I didn't have too many issues running the package. It needed a bit of BioPerl (Bio::SeqIO) to interpret the GenBank file, which I knew I'd downloaded in my GBrowse installation quest, but for some reason, I needed to specify the correct Perl sublibrary to a variable called $PERL5LIB.
> export PERL5LIB=$PERL5LIB:/sw/lib/perl5/5.8.6/

I still have a lot more to learn about working at the command line, since each time I start a new terminal, I need run this command again, if I want to use CGView.

After that, I went to the directory containing the GenBank file for one of the complete genomes (86-028NP.gbk), and typed:
> perl /path/cgview/cgview_xml_builder/cgview_xml_builder.pl -sequence PittEE.gbk -size small -output PittEE.xml
(/path defines is the place I put the CGView directory that downloaded. Mine was in a directory /path = /Users/my_name/bin)

This converted the GenBank file into an XML file suitable for reading by cgview.jar. Pretty cool! (I didn't have to think about JAVA being there, since it was already installed.)

Then invoking java:
> java -jar /path/cgview/cgview.jar -i PittEE.xml -o PittEE-map.png -f png
Presto! The figure above.

2 comments:

  1. That's why I try to get my biology students to post questions about their confusions to the course discussion board. The act of clarifying their confusion enough to explain it very often leads to solving the problem on their own. (I think blogging does much of the same thing, at least for me.)

    ReplyDelete
  2. That's awesome, man. All's I got to say is that 1) I'm really impressed and 2) I have no real idea how to even figure out BioPerl in the first place..

    ReplyDelete