Hello,
I'm currently facing troubles using galaxy. I want to compare differentially expressed genes between two treatment groups. I already map my reads on my reference genome (70% remaping) and now I'm trying to obtain the differential expression matrix using Htseq count. (For information, my data are Illumina Hiseq 2500, pair end, 125pb).
I already map my reads on my reference genome thanks to Tophat2 (70%remaping), but when I tried to run Htseq on the Bam files from Htseq send me this error message:
Fatal error: Unknown error occured Error occured when processing GFF file (line 40 of file /opt/galaxy-dist/database/files/002/052/dataset_2052791.dat): Feature DS10_00012179-RA:exon:1059 does not contain a 'gene_id' attribute [Exception type: ValueError, raised in count.py:53]**
I though that maybe it could an issue due to my gff3 file, and I tried to convert it into a gtf file using the GFF to GTF converter. But I obtain the following error message:
Traceback (most recent call last): File "/opt/shed_tools/toolshed.g2.bx.psu.edu/repos/vipints/fml_gff3togtf/6e589f267c14/fml_gff3togtf/gff_to_gtf.py", line 17, in <module> import GFFParser File "/opt/shed_tools/toolshed.g2.bx.psu.edu/repos/vipints/fml_gff3togtf/6e589f267c14/fml_gff3togtf/GFFParser.py", line 20, in <module> import scipy.io as sio ImportError: No module named scipy.io
I read that it could be because my Bam files were not sorted by the gene id. So, I tried to sort my Bam files using the tool sort from the SAMtool suite, and obtain an error message again:
Tool execution generated the following error message: Error running samtools sort. mv: cannot stat `foo.bam': No such file or directory The tool produced the following additional output: [bam_sort] Use -T PREFIX / -o FILE to specify temporary and final output files Usage: samtools sort [options...] [in.bam] Options: -l INT Set compression level, from 0 (uncompressed) to 9 (best) -m INT Set maximum memory per thread; suffix K/M/G recognized [768M] -n Sort by read name -o FILE Write final output to FILE rather than standard output -T PREFIX Write temporary files to PREFIX.nnnn.bam -@, --threads INT Set number of sorting and compression threads [1] --input-fmt-option OPT[=VAL] Specify a single input file format option in the form of OPTION or OPTION=VALUE -O, --output-fmt FORMAT[,OPT[=VAL]]... Specify output format (SAM, BAM, CRAM) --output-fmt-option OPT[=VAL] Specify a single output file format option in the form of OPTION or OPTION=VALUE --reference FILE Reference sequence FASTA FILE [null]
I do not understand why I received as much error messages. Does anyone face up a similar issue? Or knows where this problems come from?
Thank you in advance,
Thomas