Hello Galaxy Team,
I am a complete novice at this, and have been unable to find the answer to my question on the message boards, so I'm hoping you can help me. I have RNAseq data from an Illumina HiSeq. Part of the results I obtained were BAM files generated by the sequencing facility, aligned to the Arabidopsis genome (TAIR 10). I have uploaded these to Galaxy.
I am trying to use cufflinks to analyze differential expression in my samples. I uploaded the Arabidopsis genome GTF file from Illumina (http://support.illumina.com/sequencing/sequencing_software/igenome.html) and attempted to use cufflinks on the BAM file. However, I get this error:
Error running cufflinks. return code = 1 Command line: cufflinks -q --no-update-check -I 300000 -F 0.100000 -j 0.150000 -p 8 -G /galaxy-repl/main/files/009/370/dataset_9370819.dat -u -N -b /galaxy/data/Arabidopsis_thaliana_TAIR10/sam_index/Arabidopsis_thaliana_TAIR10.fa /galaxy-repl/main/files/009/319/dataset_9319592.dat [14:48:46] Loading reference annotation and sequence. Error parsing strand (2) from GFF line: 2 AT1G71695.1 1 + 26964248 26966688 26964358 26966557 3 26964248,26965590,26965942, 26964625,26965785,26966688, 0 AT1G71695 unk unk 0,
However, if I run cufflinks without the reference genome, it works just fine.
I have tried downloading different versions of the reference genome. I have also tried converting the BAM to a SAM file, sorting it for cufflinks using the published workflow (https://usegalaxy.org/workflow/display_by_username_and_slug?username=jeremy&slug=sort-sam-file-for-cufflinks), and using that file. Always get the same error.
I have also tried running the published workflow on the GTF genome from ensmble (https://usegalaxy.org/workflow/display_by_username_and_slug?username=jeremy&slug=make-ensembl-gtf-compatible-with-cufflinks). The result was an error, but random characters in the output of the error.
In what may be a related problem, I cannot see any assembled reads when I try to view the file on IGV through galaxy. However, when I use IGV on my local file, using the bai file generated with the sequence, the transcripts show up just fine. When I download the BAM file I previously uploaded to galaxy and the corresponding index generated by galaxy, I once again see no reads.
If someone can point out the rookie mistake I'm making, I would be very grateful!
Darron