Question: HT-seq counts help

14 months ago by

lucilepain • 0 wrote:

Hi everyone,

I need to analyse .bam files (sorted and indexed) that have been mapped by someone else before me. My files are as this form: - sample.merged.mdup.forward.bam - sample.merged.mdup.reverse.bam

1 Does someone has a idea of what mdup means?

2 When looking for ht-seq counts tutorials, there was only one .bam (and the .gff) as input:

->Does someone, working on paired-end mode, know if we can use the .reverse and the .forward as inputs for ht-seq counts? and how does it look?

if not, does it need to merge the forward and the reverse together before using ht-seq?

Many thanks in advance :)

ht-seq rna-seq bam • 522 views

ADD COMMENT • link •

modified 14 months ago by y.hoogstrate • 460 • written 14 months ago by lucilepain • 0

14 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

The tool expects one input mapped BAM dataset. If there is paired-end data, the mapping should be done in paired-end mode, where forward and reverse reads fastq reads (in fastqsanger format) are entered into the same mapping job, producing a single BAM output.

Are you certain these data are mapped already? Sometimes fastq data is transformed into BAM format (unmapped)? That said, the name including "mdup" indicates that duplicates were removed, using some tool, maybe with Samtools or maybe not. But this is a guess - a file can be named in ways that have nothing to do with the processing.

So - even if already mapped, you'll need to remap the data as paired-end anyway to use the htseq_count tool (and most others). Extract the fastq data from the BAMs with the tool BEDTools > Convert from BAM to FastQ. From there you can follow the tutorial workflows, know what data you are working with (reference genome/build), and have more control over the analysis (better results!).

There are tools to merge BAM datasets, but I wouldn't recommend doing that in your case for a few different reasons. Remapping is best.

Fastq format FAQs are here. In short, you want to double check the data is really in fastqsanger format before assigning that datatype - or - use the Fastq Groomer to transform the quality scores.

Hope this helps, Jen, Galaxy team

ADD COMMENT • link modified 14 months ago • written 14 months ago by Jennifer Hillman Jackson ♦ 25k

14 months ago by

lucilepain • 0

lucilepain • 0 wrote:

Thank you Jennifer, Unfortunately yes, the analyst said to me that my forward.bam and reverse.bam files were already mapped. That's why I was asking for. :/

ADD COMMENT • link written 14 months ago by lucilepain • 0

Maybe they can do the mapping for you paired-end? Or you can remap that way yourself in Galaxy.

Using the same exact target reference genome for all steps in an analysis is critical (in Galaxy, line-command, anywhere/any toolset) and doing your own mapping within Galaxy can prevent many headaches downstream. For example, if genome mismatch problems do come up later, the process to figure out and correct the data is fairly tedious. This is how it can be done in Galaxy but the same basic methods would apply for correcting data line-command. https://galaxyproject.org/support/chrom-identifiers/

ADD REPLY • link modified 14 months ago • written 14 months ago by Jennifer Hillman Jackson ♦ 25k

14 months ago by

y.hoogstrate • 460

Netherlands

y.hoogstrate • 460 wrote:

Mdup most likely refers to http://broadinstitute.github.io/picard/command-line-overview.html#MarkDuplicates

Maybe you can extract the BAM headers. It could be the picard settings are included.

ADD COMMENT • link written 14 months ago by y.hoogstrate • 460

Agree (that tool or rmdup are potentials).

To extract the headers within Galaxy, use the tool BAM-to-SAM with the option to just output the header.

ADD REPLY • link written 14 months ago by Jennifer Hillman Jackson ♦ 25k

Please log in to add an answer.

1 Does someone has a idea of what mdup means?

2 When looking for ht-seq counts tutorials, there was only one .bam (and the .gff) as input:

Similar posts • Search »