Question: FASTA sequences not blasting.
0
gravatar for gbernard
3 months ago by
gbernard0
gbernard0 wrote:

Hello Great Minds,

My FASTA sequences ( in tabular format) are not blasting. I downloaded the UNIPROT fasta.gz file to use as a database. Please provide some direction.

Best regards. GCB

rna-seq • 213 views
ADD COMMENTlink written 3 months ago by gbernard0
1
gravatar for gb
3 months ago by
gb60
gb60 wrote:

Not sure what you mean but this is already wrong:

My FASTA sequences ( in tabular format) are not blasting

Fasta is a certain format, first you have a line starting with ">" and a description and the next line is a sequence. https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=BlastHelp

So your file with sequences needs to be in in FASTA format not tabular. And would need to do a blastp.

ADD COMMENTlink written 3 months ago by gb60

Thank you so much! You are correct. The sequences are in fasta format with a Trinity header from the assembly. I downloaded the UNIPROT protein database and unzipped the file to compare against my sequences. Both are in FASTA format according to Galaxy. The issue now is to figure out what is the input value for the subject/database sequences and protein database section in the NCBI BLASTx. The protein database section reads "no Blast dbp available", I tried BLASTing my assembled reads against the UNIPROT database which is saved in my history and no results. Any suggestions? Is there a database I can import?

Best regards,

ADD REPLYlink written 3 months ago by gbernard0

I am not familiar with the public servers so I can not help you more. But you need to mention which galaxy server you are using. Also check the input, before you can blast against a reference you need to index the reference fasta file. So basicly you convert a fasta file to a blast database.

EDIT:

I just checked https://usegalaxy.org and went to the tool named "NCBI BLAST+ blastx". You need to change the setting "Subject database/sequences" to "fasta file from history". You also need to check the genetic code setting.

And an other thing, I am not sure if this is the best approach. Uniprot is a large database and your trinity output probaly contains a lot of reads so I think it will take very long. Maybe you can reduce your input. Like removing the duplicates or something. You can also try diamond.

ADD REPLYlink modified 3 months ago • written 3 months ago by gb60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 180 users visited in the last hour