Question: HTSeq-Count Error on Cloudman
0
gravatar for madkisson
2.6 years ago by
madkisson30
United States
madkisson30 wrote:

Hi

I'm working on Cloudman doing RNASeq transcriptome analysis. I've made BAM file using TopHat2 and RefSeq mm10 gtf file for Junctions.

I'm now trying to run them through HTSeq-Count to obtain reads per gene but after a few successful run [with disappointing results] I'm now getting failed runs with the following error:

Fatal error: Unknown error occured
100000 GFF lines processed.
200000 GFF lines processed.
300000 GFF lines processed.

...

16400000 SAM alignment records processed.
16497350 SAM alignments  processed.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[sam_header_read2] 66 sequences loaded.
[sam_read1] reference '' is recognized as '*'.
Parse error at line 1: invalid CIGAR character

It may be that my previous successful runs were with BAM files with which I used an Ensembl gtf file, so the nomenclature may be different? Any thoughts?

ADD COMMENTlink modified 2.6 years ago by Jennifer Hillman Jackson23k • written 2.6 years ago by madkisson30

I'm still getting this error message even with BAM files made with no junction annotation file used

ADD REPLYlink written 2.6 years ago by madkisson30
0
gravatar for Jennifer Hillman Jackson
2.6 years ago by
United States
Jennifer Hillman Jackson23k wrote:

Hello,

It could be that the mapping run was a failure. Meaning, it may have executed OK, but the results are poor. Or it is possible that the run simply didn't fully complete - double check that you have sufficient resource allocated (memory and available disk space). Failed runs that are "green" are uncommon, but can occur. Plus there is a difference between content failures (low mapping rates & other issues) and full-out job failures due to technical reasons (these should always be "red", not "green"). What is going on exactly is not exact clear - except that the BAM dataset is almost certainly incomplete for some reason.

Also, double checking the inputs is a very good idea. I wouldn't expect this error to result from this cause, but if the technical run issues are cleared up, content issues will pop out next. Make certain that all inputs are based on the same reference genome. In particular, the chromosome identifiers must be an exact match. This Galaxy wiki has advice if you need to understand how to investigate: Reference Genomes and Mismatch Issues

Hopefully this helps! Jen, Galaxy team

ADD COMMENTlink written 2.6 years ago by Jennifer Hillman Jackson23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 124 users visited in the last hour