Question: Galaxy mirdeep2 error
0
gravatar for sajib.mahfuz.bau
10 months ago by
sajib.mahfuz.bau0 wrote:

Hi, I am here to seek your help. I am trying to work with mirdeep2 (Within Galaxy) (identification of novel and known miR) after making collapsed reads and mapping of my reads with arf format. But when i run my desired module with the respective file it shows error. Report says the fault is on the genome which contains white space in the first identifier. However, i removed all the white space from my genome sequence before(attached in the following). I am trying to do this analysis for a long time. Actually, I am a biologist i don't know much about bioinformatics. I will be very happy and grateful for your kind help. Any commend and or suggestion will be well appreciated. Thanks in advance.

Data error report Fatal error: Exit code 1 ()

Starting miRDeep2

/usr/local/tools/_conda/envs/__mirdeep2@2.0.0.8/bin/miRDeep2.pl /data/1/galaxy_db/files/005/643/dataset_5643981.dat /data/7/galaxy_db/files/005/643/dataset_5643978.dat /data/1/galaxy_db/files/005/643/dataset_5643982.dat /data/7/galaxy_db/files/005/643/dataset_5643973.dat /data/7/galaxy_db/files/005/643/dataset_5643976.dat /data/7/galaxy_db/files/005/643/dataset_5643974.dat -t cfa -g 50000 -b 0

miRDeep2 started at 10:55:06

mkdir mirdeep_runs/run_28_01_2018_t_10_55_06

Error: Genome file /data/7/galaxy_db/files/005/643/dataset_5643978.dat has not allowed whitespaces in its first identifier

And Genome Sequence (Dog) look like this

chr1:101-122678685 TATGTGAGAAGATAGCTGAACGCCTTGTCCACATCATCTTACTGCTGAGAGTTGAGCTCA CCCTCAGTCCCTCACAGTTCCACACTGCCTGCAGAGTGAGTTTCCCATGTCTTCACCAGA GACTTTTGCCAGAGGCTTCTGAGACGCAAGTTAACAATGCAGACCTGGAGGGTATCTCCA GGTGCAGTAGAGTGGTAATCTCGGAACCTCCTGACTCAGAATACTGCTACCTTCACACTG TCATAAGAATGCAGCGAGTTGAGAGCTGGCTTCTAGGCATGCTTCCTTTTGAGAGCTGAG GACAGGACAGAACCCTCCCGCATCCTGCCTGACTGTAGACGTACCTGCTAACCTCCTCAT GTTAGTGGCTGGGATAGATTGTGGGAAAAGCATGTGTAAGCATTGGGCCTGAACTCCCGT GTATCTGAGTTGAATACAGCGATTTCCAACATCCTTCTTCAATAGGAGTGTAGCTAGGTT CCAACTCCCATGTCCGAGTGGGTAGCAGACATCTGCCTTCCATGCATACACACTTCTGAG AGTTGAGCTTATGGCCTGTAACCCTACCTCCTGCCTGCAGCTACCTTTTGCTTCCAAAAG TCCTAGGCTCGCTGCTTCACCAAAGTGTTGGGAGAGGTAACTGTTGTCTCCCGGCACACA AGACTAGTGCCTCCAAGCTCAATCCAGCGATTTCCCAGTAATTCCTGGGTTAGACTGGTG CTACATACTAAGTTCCATACGTGAGTAGGTAGTTGAAAGCCTTGTCCAAAAACATCTTAC TTCTGAGAGTTGAGCTCACCCTCAGTCCCTCACAGTTCCACACTGCCTGCAGAGTGAGTT TCCCACGTCTTCATCAGAGACTTTTGCCAGAGGCTTCTGAGACGCAAGTTAACAATGCAA ACAGGAGGGTATACCCAGGTGCAGTAGATTGGTTATCTGGGAACCTCCTTACTCAGAATA CTGTTACCTTCACACTGTCATAAGAATGCAGCTAGTTGAGAGCTGGCTTCTAGGCATGCT TCCCTGTGAGAGCTGAGGACAGGGCAGAACCCTCCCGCATCCTGCCTGACTGTAGACGTA CCTGCTAACCTCCTCATGTTAGTGGCTCGGATAGGTTGTGGGAAAAGCATGTGTAAGCAT TGGGCCTGATCTCCCGTGTATCTGAGTTGAATACAGCGATTTCCAACATCCTTCTTCAAT AGGAGTGTAGCTAGGTTCCAACTCCCATGTCCGAGTGGGTAGCAGACATCTGCCTCCCAT GCATACCCACTTCTGAGAGTTGAGCTTATGGCCTGTAACCCTACCTCCTGCCTGCAGCTA CCTTTCGCTTCCAAAAGGCCTAGGCTCGCTGCTACACCGAAGTGTTGGGAGAGGTAACTG GAATCTCCCGGCACACAAGACTAGTGCCTCCAAGCTCAATCCAGCGATTTCCCAGTAATT CCTGGGGTAGACTGGTGCTACATACTAAGTTCCATATGTGAGAAGATAGCTGAACGCCTT GTCCAAAATCATCTTACTGCTGAGAGTTGAGCTCACCCTCAGTCCCTCACAGTTCCACAC TGCCTGCAGAGTGAGTTTCCCATGTCTTCACCAGAGACTTTTGCCAGAGGCTTCTGAGAC GCAAGTTAACAATGCAGACCTGGAGGGTATCTCCAGGTGCAGTAGAGTGGTAATCTCGGA ACCTCCTGACTCAGAATACTGCTACCTTCACACTGTCATAAGAATGCAGCGAGTTGAGAG

ADD COMMENTlink modified 9 months ago by Jennifer Hillman Jackson25k • written 10 months ago by sajib.mahfuz.bau0
3
gravatar for Jennifer Hillman Jackson
9 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

The error is likely about some of the fastq sequence identifiers in your mapped inputs containing spaces (and not the reference genome's ">" lines).

One of these should resolve the error, but let us know:

  1. Filter the SAM/BAM dataset to remove unmapped reads. Mapped reads will already have the fastq identifiers trimmed (the first whitespace and any content after removed).

  2. If the input is in SAM format, convert to BAM and rerun to see if that gets around the problem (it does for other tools, but I am not sure about this one).

I know that you have checked the custom genome fasta, but what about the other fasta inputs? If you have not done this already, it is a good idea to run the tool NormalizeFasta on all fasta inputs to make sure the fasta formatting is cleaned up. I don't think this is the source of the problem with the current error, but it can't hurt and is a recommended formatting step. This FAQ focuses on custom genome fasta formatting but the methods can be applied to any fasta dataset to prepare them for use with tools: https://galaxyproject.org/learn/custom-genomes/#format

Thanks! Jen, Galaxy team

ADD COMMENTlink modified 9 months ago • written 9 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 147 users visited in the last hour