HTSeq-Count Error on Cloudman

Question: HTSeq-Count Error on Cloudman

3.6 years ago by

United States

madkisson • 30 wrote:

I'm working on Cloudman doing RNASeq transcriptome analysis. I've made BAM file using TopHat2 and RefSeq mm10 gtf file for Junctions.

I'm now trying to run them through HTSeq-Count to obtain reads per gene but after a few successful run [with disappointing results] I'm now getting failed runs with the following error:

Fatal error: Unknown error occured
100000 GFF lines processed.
200000 GFF lines processed.
300000 GFF lines processed.

...

16400000 SAM alignment records processed.
16497350 SAM alignments  processed.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[sam_header_read2] 66 sequences loaded.
[sam_read1] reference '' is recognized as '*'.
Parse error at line 1: invalid CIGAR character

It may be that my previous successful runs were with BAM files with which I used an Ensembl gtf file, so the nomenclature may be different? Any thoughts?

rnaseq galaxy cloudman htseq-count • 1.4k views

ADD COMMENT • link •

modified 3.6 years ago by Jennifer Hillman Jackson ♦ 25k • written 3.6 years ago by madkisson • 30

I'm still getting this error message even with BAM files made with no junction annotation file used

ADD REPLY • link written 3.6 years ago by madkisson • 30

3.6 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

It could be that the mapping run was a failure. Meaning, it may have executed OK, but the results are poor. Or it is possible that the run simply didn't fully complete - double check that you have sufficient resource allocated (memory and available disk space). Failed runs that are "green" are uncommon, but can occur. Plus there is a difference between content failures (low mapping rates & other issues) and full-out job failures due to technical reasons (these should always be "red", not "green"). What is going on exactly is not exact clear - except that the BAM dataset is almost certainly incomplete for some reason.

Also, double checking the inputs is a very good idea. I wouldn't expect this error to result from this cause, but if the technical run issues are cleared up, content issues will pop out next. Make certain that all inputs are based on the same reference genome. In particular, the chromosome identifiers must be an exact match. This Galaxy wiki has advice if you need to understand how to investigate: Reference Genomes and Mismatch Issues

Hopefully this helps! Jen, Galaxy team

ADD COMMENT • link written 3.6 years ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »