Thursday, May 28, 2009

GBrowse Update (boring, except the bit about the fish)

I finally succeeded in my attempt to put a functioning GBrowse installation on my computer. My problem turned out to be extremely trivial and way upstream of the actual GBrowse installation. Now I’ve got to figure out how to correctly produce and configure a basic Haemophilus influenzae genome database...

My first big problem--now solved--was that I was unable to view webpages from localhost, that is I couldn’t view webpages served by Apache2 on my own computer. All kinds of other people have had such problems, but none of the help I found on forums seemed to help with my problem. Nevertheless, I did learn a lot more about the way that files are organized in Mac’s UNIX.

Luckily, in my forum perusal, I stumbled across a nice switch to the apachectl function, so that at the command line I could type: "sudo apachectl -t" or "sudo apachectl configtest". This gave me a syntax check of my Apache webserver configuration, which returned a convoluted syntax error. After checking the several files where the errors were called, I figured out that I’d failed to add a space between two separate statements in the configuration file I'd made that set my own permissions. Gah. Anyways, it works now. The problem WAS actually covered in several of the forums I'd searched (because Apple moved some things around in their latest Leopard upgrade), but that's no help when you type the needed file in wrong...

After that I had one additional problem, which was that I used all the default settings with the GBrowse install script, which put everything into the wrong or non-existent directories. When I repeated the install with the correct paths for Mac OS X 10.5.7 and Apache2, I suddenly had the Generic Genome Browser on my computer!

I found this site, which told me to redirect the installer to the following paths when prompted:
Apache conf directory? [/usr/local/apache/conf] /etc/apache2/
Apache htdocs directory? [/usr/local/apache/htdocs] /var/www/localhost/htdocs/
Apache cgibin directory? [/usr/local/apache/cgi-bin] /var/www/localhost/cgi-bin/
Presto! A working web server with a working genome browser!
I went to http://localhost/gbrowse/, and got a page annoucing:
Welcome to the Generic Genome Browser!
A happier moment of web surfing, I've not had since I found out about the fish with a transparent head.

Okay, nonetheless, I still haven’t gotten the Haemophilus influenzae KW20 genome properly working in the browser. I’ve gone through the tutorial pretty thoroughly and have correctly configured their tutorial Volvox database using a MySQL backend. It works fine.
My KW20 database seems to be correctly imported into MySQL, but nothing shows up on the webpage. I originally thought it had something to do with my configuration file, but now I suspect some kind of import defect. I’ve tried it two ways.

(1)Doing it in memory: Getting the GenBank file from NCBI, converting it with BioPerl’s conversion program, bp_genbank2gff3.pl, then loading it with BioPerl’s bp_bulk_load_gff.pl

(1) Doing it in MySQL with the GFF and FASTA files from TIGR’s homepage and using BioPerl’s MYSQL dumper, bp_seqfeature_load.pl (for which I can find no good link or man page).

Neither of these worked. I’ve been tweaking the configuration file and trying to reload the database in several ways. So far with no luck. But progress! I'm fairly certain I need to understand the Adaptors better...

3 comments:

  1. Hi Joshua,

    I didn't find it boring :-)

    I hope by now you have this working; if not, I'd suggest the GBrowse mailing list.

    The problem you had with the installer trying to use /usr/local/apache seems to happen to people sometimes. Most of the time on Mac systems, it does guess the correct directories. I've yet to figure out why this happens but I'm glad you sorted that out.

    ReplyDelete
  2. Thanks Scott! Glad you didn't find it boring, though you've got to admit that the bit about the fish IS more exciting.

    I ended up not persisting in making the GBrowse database, because I found UCSC's microbial genome database, and that has sufficed at least temporarily. They're only hosting one of the complete genome sequences of H. influenzae, so it's not ideal for me.

    Regardless, I will need to get something of my own working for my upcoming data-deluge, which I'm sure UCSC will not want to deal with. So, I'll be returning to GBrowse. With a fresh computer (the other challenge I'd had was having waayy too many versions of the same things installed in different versions of my computer.)

    Thanks for the tip on signing up for the mailing list! When I get back to it, I certainly will!

    ReplyDelete
  3. hello Joshua Mell
    great reading the steps and the problems u faced , me to have gone through.

    This is utpalHandique here, me too have install gbrowse and everything is working fine,
    i have only tried the browser with sample data but when i feed the whole genome , the system is unable to provide the resource for the execution can u suggest what should be the RAM capacity needed , or is there any third party or community who host such at low or for free

    let me know as early as possible to moneybabymoney@gmail.com

    ReplyDelete