Question: error using HTseq-count for conversion of .bam to raw counts
1
gravatar for linda.boshans
2.0 years ago by
United States
linda.boshans10 wrote:

Hello,

I am trying to convert the .bam files I got as output from tophat alignment into raw counts so that I can do differential expression analysis with DESeq2. I am using a genes.gtf files that I obtained from iGenome. I have picked the option of sorting the files by name for paired end reads.


When running, I get the following error:

Fatal error: Unknown error occured
[bam_sort_core] merging from 9 files...
100000 GFF lines processed.
200000 GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
500000 GFF lines processed.
600000 GFF lines processed.
672343 GFF lines processed.
Warning: Read NS500402:32:H3JJJAFXX:1:11101:1020:5895 claims to have an aligned mate which could not be found in an adjacent line.
100000 SAM alignment record pairs processed.
200000 SAM alignment record pairs processed.
300000 SAM alignment record pairs processed.
400000 SAM alignment record pairs processed.
500000 SAM alignment record pairs processed.
600000 SAM alignment record pairs processed.
700000 SAM alignment record pairs processed.
800000 SAM alignment record pairs processed.
900000 SAM alignment record pairs processed.
1000000 SAM alignment record pairs processed.
1100000 SAM alignment record pairs processed.
1200000 SAM alignment record pairs processed.
1300000 SAM alignment record pairs processed.
1400000 SAM alignment record pairs processed.
1500000 SAM alignment record pairs processed.
1600000 SAM alignment record pairs processed.
1700000 SAM alignment record pairs processed.
1800000 SAM alignment record pairs processed.
1900000 SAM alignment record pairs processed.
2000000 SAM alignment record pairs processed.
2100000 SAM alignment record pairs processed.
2200000 SAM alignment record pairs processed.
2300000 SAM alignment record pairs processed.
2400000 SAM alignment record pairs processed.
2500000 SAM alignment record pairs processed.
2600000 SAM alignment record pairs processed.
2700000 SAM alignment record pairs processed.
2800000 SAM alignment record pairs processed.
2900000 SAM alignment record pairs processed.
3000000 SAM alignment record pairs processed.
3100000 SAM alignment record pairs processed.
3200000 SAM alignment record pairs processed.
3300000 SAM alignment record pairs processed.
3400000 SAM alignment record pairs processed.
3500000 SAM alignment record pairs processed.
3600000 SAM alignment record pairs processed.
Error occured when processing SAM input (record #3631535 in file name_sorted_alignment.bam):
  'pair_alignments' needs a sequence of paired-end alignments
  [Exception type: ValueError, raised in __init__.py:603]

How do I go about fixing this? I am lost as to how to troubleshoot this. Any help greatly appreciated. Thanks

error raw counts htseq • 1.2k views
ADD COMMENTlink modified 2.0 years ago by Jennifer Hillman Jackson25k • written 2.0 years ago by linda.boshans10
0
gravatar for Jennifer Hillman Jackson
2.0 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

Use the tool Picard: FixMateInformation to correct the flags and try a re-run.

This tool works quickest and uses less resources when the input is queryname sorted before running it (instead using the tool form sort option). Use Picard: SortSam.

Thanks, Jen, Galaxy team

ADD COMMENTlink written 2.0 years ago by Jennifer Hillman Jackson25k
1

Hi Jennifer, your solution did't work for me.

ADD REPLYlink written 4 months ago by pengchy10

Hello, Yes, this older post no longer applies to current functionality/usage. Tophat is now deprecated and there are more bam datatypes and some tools are still being upgraded to work with queryname sorted BAMs (SAMtools and Picard tools are in this group).

Coordinate sorted bams are now the default format tools expect as input, and the tools that have been updated will queryname sort with options on the tool form or as part of the built-in processing. HTseq has sorting built in now in the latest tool version, so there is no need to pre-queryname sort or use the now non-existent option to queryname sort during the run. If using the most current tools/protocols, there should be no need to use the FixMateInformation tool that I am aware of.

First, avoid Tophat (it is deprecated) and use HISAT2 instead.

Next, please see the Galaxy RNA-seq tutorials for how to use HISAT2 for spliced alignments, how to create HTseq/Featurecounts count files, and how to use the other tools related to differential expression analysis/workflows (DeSeq2, etc).

Thanks! Jen

ADD REPLYlink modified 4 months ago • written 4 months ago by Jennifer Hillman Jackson25k

Hi Jennifer, thank you for your reply. I will have a try HISAT2. Another solution for this problem maybe: https://www.biostars.org/p/111221/#111727, where the paired-end reads were extracted from the bam file and re-run the htseq-count.

Best, Pengcheng

ADD REPLYlink written 4 months ago by pengchy10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 171 users visited in the last hour