14 months ago by
United States
Hello,
The tool expects one input mapped BAM dataset. If there is paired-end data, the mapping should be done in paired-end mode, where forward and reverse reads fastq reads (in fastqsanger
format) are entered into the same mapping job, producing a single BAM output.
Are you certain these data are mapped already? Sometimes fastq data is transformed into BAM format (unmapped)? That said, the name including "mdup" indicates that duplicates were removed, using some tool, maybe with Samtools or maybe not. But this is a guess - a file can be named in ways that have nothing to do with the processing.
So - even if already mapped, you'll need to remap the data as paired-end anyway to use the htseq_count tool (and most others). Extract the fastq data from the BAMs with the tool BEDTools > Convert from BAM to FastQ. From there you can follow the tutorial workflows, know what data you are working with (reference genome/build), and have more control over the analysis (better results!).
There are tools to merge BAM datasets, but I wouldn't recommend doing that in your case for a few different reasons. Remapping is best.
Fastq format FAQs are here. In short, you want to double check the data is really in fastqsanger
format before assigning that datatype - or - use the Fastq Groomer to transform the quality scores.
Hope this helps, Jen, Galaxy team