Question: BAM file not working with Cufflinks or IGV
1
gravatar for dluesse
3.9 years ago by
dluesse10
United States
dluesse10 wrote:

Hello Galaxy Team,

I am a complete novice at this, and have been unable to find the answer to my question on the message boards, so I'm hoping you can help me.  I have RNAseq data from an Illumina HiSeq.  Part of the results I obtained were BAM files generated by the sequencing facility, aligned to the Arabidopsis genome (TAIR 10).  I have uploaded these to Galaxy. 

I am trying to use cufflinks to analyze differential expression in my samples.  I uploaded the Arabidopsis genome GTF file from Illumina (http://support.illumina.com/sequencing/sequencing_software/igenome.html) and attempted to use cufflinks on the BAM file.  However, I get this error:

Error running cufflinks.
return code = 1
Command line:
cufflinks -q --no-update-check -I 300000 -F 0.100000 -j 0.150000 -p 8 -G /galaxy-repl/main/files/009/370/dataset_9370819.dat -u -N -b /galaxy/data/Arabidopsis_thaliana_TAIR10/sam_index/Arabidopsis_thaliana_TAIR10.fa /galaxy-repl/main/files/009/319/dataset_9319592.dat 
[14:48:46] Loading reference annotation and sequence.
Error parsing strand (2) from GFF line:
2	AT1G71695.1	1	+	26964248	26966688	26964358	26966557	3	26964248,26965590,26965942,	26964625,26965785,26966688,	0	AT1G71695	unk	unk	0,

However, if I run cufflinks without the reference genome, it works just fine. 

I have tried downloading different versions of the reference genome.  I have also tried converting the BAM to a SAM file, sorting it for cufflinks using the published workflow (https://usegalaxy.org/workflow/display_by_username_and_slug?username=jeremy&slug=sort-sam-file-for-cufflinks), and using that file.  Always get the same error. 

I have also tried running the published workflow on the GTF genome from ensmble (https://usegalaxy.org/workflow/display_by_username_and_slug?username=jeremy&slug=make-ensembl-gtf-compatible-with-cufflinks).  The result was an error, but random characters in the output of the error. 

In what may be a related problem, I cannot see any assembled reads when I try to view the file on IGV through galaxy.  However, when I use IGV on my local file, using the bai file generated with the sequence, the transcripts show up just fine.  When I download the BAM file I previously uploaded to galaxy and the corresponding index generated by galaxy, I once again see no reads. 

If someone can point out the rookie mistake I'm making, I would be very grateful!

Darron

 

rna-seq cufflinks galaxy bam • 1.8k views
ADD COMMENTlink modified 3.8 years ago • written 3.9 years ago by dluesse10
0
gravatar for Jennifer Hillman Jackson
3.8 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

I would start by confirming that the reference genome each input is based on was and is (if used again in a run) identical. Help to do that is available here:
http://wiki.galaxyproject.org/Support#Reference_genomes

Thanks, Jen, Galaxy team

ADD COMMENTlink written 3.8 years ago by Jennifer Hillman Jackson25k
0
gravatar for dluesse
3.8 years ago by
dluesse10
United States
dluesse10 wrote:

OK, I discovered the problem.  I was not unpacking the .tar file.  I uploaded the genome straight from igenomes to Galaxy.  

For anyone who comes across this problem in the future, here is what you should do:

1) Download .tar file from igenomes to your computer.  

2) Download 7Zip.  It is free.  

3) Right click on the genome file, select "7Zip" and then "extract files."  I'd pick a new folder to put them in.  

4) Navigate through the maze of files to find the most recent .gtf.  Upload that to Galaxy.  

Cheers,

Darron

ADD COMMENTlink written 3.8 years ago by dluesse10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 111 users visited in the last hour