天涯海角

My Web Home

Monthly Archives: 11月 2015

Full_lengther_NEXT v0.0.8 database setup and run

Full_lengther_next is an excellent program to test if you EST assembly are full-length or not, or test how much percentage it can get.

for the 1st step of analysis, you probably needs a database for the program to blast+. the built-in script ‘download_fln_dbs.rb’ is designed to setup the database for you. Basically, the script would download taxonomy database from uniprot, splice database from uniprot, and non-coding RNA database from SCBI, filter out the non-full-length sequences, and make BLAST+ database for BLAST+ program. Unfortunately, the script would report NET::FTP or gzip problems.

I ran ‘download_fln_dbs.rb’, and got error like:

550 Permission denied. (Net::FTPPermError)

This is probably duo to errors either from ruby NET::FTP and uniprot FTP sever. (Sorry I am a PERLer, NOT a RUBYer).

and if you get a gzip error,

gzip *.gz not found

probably you need to change you environment variable BLASTDB in ~/.bashrc to a format like:

BLASTDB=/path/to/your/blastdb/

The last letter ‘/’is needed to find /path/to/your/blastdb/*.gz files downloaded from uniprot, or else if will find /path/to/your/blastdb*.gz files.

 

OK, let’s start to setup half-manually:

 

1. Setup environment variable BLASTDB if you did NOT do this

to test if you already had one:

$ echo $BLASTDB

If it returns a path like /path/to/your/blastdb, ignore this step

$ vim ~/.bashrc

#add one more line at the end of file

#press letter i or a

#NOTE: change /path/to/your/blastDB/ to a specified path depending on your machine, like: $HOME/blastdb

export BLASTDB=/path/to/your/blastDB/

#Esc and :wq in vim to save the changes

#log out-and-in or use source ~/.bashrc to take effect

#It seems no effect just using shell ‘export BLASTDB=/path/to/blastdb, no idea why. I knew it will detect $ENV[‘BLASTDB’], but it still  shows the path in ./.bashrc

2. Download necessary files from uniprot:

$ mkdir –p $BLASTDB

$ cd $BLASTDB

$ wget ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/uniprot_trembl_plants.dat.gz

$ wget ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/uniprot_sprot_plants.dat.gz

$ wget ftp://ftp.uniprot.org//pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot_varsplic.fasta.gz

 

#setup non-coding database

$ mkdir $BLASTDB/nc_rna_db

$ cd $BLASTDB/nc_rna_db

$ wget http://www.scbi.uma.es/downloads/FLNDB/ncrna_fln_100.fasta.zip

$ unzip ncrna_fln_100.fasta.zip

$ mv ncrna_fln_100.fasta ncrna_fln_100.fasta.oldID

#change seqids

$ perl -ne ‘BEGIN {$seqid=ncrna0000000001;} if (/^>/) {chomp; s/^>//; $_=”>”.$seqid.” “.$_; print $_, “\n”;$seqid++;}else {print;}’ ncrna_fln_100.fasta.oldID > ncrna_fln_100.fasta

# make blast DB

$ makeblastdb -in ncrna_fln_100.fasta -dbtype nucl -parse_seqids

3. find download_fln_dbs.rb  and edit:

# The script should not be detected using which cmd

#it either locates in /var/lib/gems/*/gems/full_lengther_next-0.0.8/bin/ or $HOME/.gem/ruby/*/gems/full_lengther_next-0.0.8/bin/

#open it with editor:

#go to line 202

my_array = [“human”,”fungi”,”invertebrates”,”mammals”,”plants”,”rodents”,”vertebrates”]

#Add # from left, to make it inactive

#my_array = [“human”,”fungi”,”invertebrates”,”mammals”,”plants”,”rodents”,”vertebrates”]

#and edit the following line:

# my_array = [“plants”,”human”]

#Remove # and unnecessary species, like:

my_array = [“plants”]

# I used PLANTS and download PLANTS uniprot database as showed in DOWNLOAD section

# comment the following line to:

#conecta_uniprot(my_array, formatted_db_path)

#This line it used to down load the uniprot database, which usually report some NET::FTP error

#and edit following line to:

system(‘gunzip ‘File.join(formatted_db_path, ‘uniprot*.gz’))

#Haha, Just learning RUBY. Ruby looks like python. That will fix the gzip uncompress error and avoid to uncompress all the GZ files in your $BLASTDB

#

#find line below:

download_ncrna(formatted_db_path)

#and comment with #, to

#download_ncrna(formatted_db_path)

#this will inactivate the downloading of NON-CODING RNA database, which can not successfully create the BLAST+ database using makeblastdb, guess some special letters in seqids

And then execute download_fln_dbs.rb

$ download_fln_dbs.rb

You will have 3 folders:

$ ls $BLASTDB/tr_plants

tr_plants.fasta         tr_plants.fasta.00.pog  tr_plants.fasta.00.psq  tr_plants.fasta.01.pog  tr_plants.fasta.01.psq
tr_plants.fasta.00.phr  tr_plants.fasta.00.psd  tr_plants.fasta.01.phr  tr_plants.fasta.01.psd  tr_plants.fasta.pal
tr_plants.fasta.00.pin  tr_plants.fasta.00.psi  tr_plants.fasta.01.pin  tr_plants.fasta.01.psi

 

$BLASTDB/sp_plants

sp_plants.fasta      sp_plants.fasta.pin  sp_plants.fasta.psd  sp_plants.fasta.psq
sp_plants.fasta.phr  sp_plants.fasta.pog  sp_plants.fasta.psi

 

$BLASTDB/nc_rna_db

ncrna_fln_100.fasta      ncrna_fln_100.fasta.nin  ncrna_fln_100.fasta.nsd  ncrna_fln_100.fasta.nsq
ncrna_fln_100.fasta.nhr  ncrna_fln_100.fasta.nog  ncrna_fln_100.fasta.nsi

Run full_length_next as normal:

$ full_lengther_next -fasta $fastafile -taxon_group plants

I am still testing, at least I had database successfully created. Waiting for further error report. Will Update this post once have any updates

Aha… Have fun….Hope it helps

Plus: I want help from a RUBYer to figure out how full_length_next filters out non-full-length proteins. I would like to revise it in PERL, so Everyone can run it without any error. IF YOU WANT TO HELP, LET ME KNOW PLEASE.

 

Dr Fu-Hao Lu

Post-Doctoral Scientist

Jogn Innes Centre, Colney Lane

Norwich NR4 7ES, UK

Email: Fu-Hao.Lu@jic.ac.uk