Question: HT-seq counts help
0
gravatar for lucilepain
14 months ago by
lucilepain0
lucilepain0 wrote:

Hi everyone,

I need to analyse .bam files (sorted and indexed) that have been mapped by someone else before me. My files are as this form: - sample.merged.mdup.forward.bam - sample.merged.mdup.reverse.bam

1 Does someone has a idea of what mdup means?

2 When looking for ht-seq counts tutorials, there was only one .bam (and the .gff) as input:

->Does someone, working on paired-end mode, know if we can use the .reverse and the .forward as inputs for ht-seq counts? and how does it look?

if not, does it need to merge the forward and the reverse together before using ht-seq?

Many thanks in advance :)

ht-seq rna-seq bam • 522 views
ADD COMMENTlink modified 14 months ago by y.hoogstrate460 • written 14 months ago by lucilepain0
0
gravatar for Jennifer Hillman Jackson
14 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

The tool expects one input mapped BAM dataset. If there is paired-end data, the mapping should be done in paired-end mode, where forward and reverse reads fastq reads (in fastqsanger format) are entered into the same mapping job, producing a single BAM output.

Are you certain these data are mapped already? Sometimes fastq data is transformed into BAM format (unmapped)? That said, the name including "mdup" indicates that duplicates were removed, using some tool, maybe with Samtools or maybe not. But this is a guess - a file can be named in ways that have nothing to do with the processing.

So - even if already mapped, you'll need to remap the data as paired-end anyway to use the htseq_count tool (and most others). Extract the fastq data from the BAMs with the tool BEDTools > Convert from BAM to FastQ. From there you can follow the tutorial workflows, know what data you are working with (reference genome/build), and have more control over the analysis (better results!).

There are tools to merge BAM datasets, but I wouldn't recommend doing that in your case for a few different reasons. Remapping is best.

Fastq format FAQs are here. In short, you want to double check the data is really in fastqsanger format before assigning that datatype - or - use the Fastq Groomer to transform the quality scores.

Hope this helps, Jen, Galaxy team

ADD COMMENTlink modified 14 months ago • written 14 months ago by Jennifer Hillman Jackson25k
0
gravatar for lucilepain
14 months ago by
lucilepain0
lucilepain0 wrote:

Thank you Jennifer, Unfortunately yes, the analyst said to me that my forward.bam and reverse.bam files were already mapped. That's why I was asking for. :/

ADD COMMENTlink written 14 months ago by lucilepain0

Maybe they can do the mapping for you paired-end? Or you can remap that way yourself in Galaxy.

Using the same exact target reference genome for all steps in an analysis is critical (in Galaxy, line-command, anywhere/any toolset) and doing your own mapping within Galaxy can prevent many headaches downstream. For example, if genome mismatch problems do come up later, the process to figure out and correct the data is fairly tedious. This is how it can be done in Galaxy but the same basic methods would apply for correcting data line-command. https://galaxyproject.org/support/chrom-identifiers/

ADD REPLYlink modified 14 months ago • written 14 months ago by Jennifer Hillman Jackson25k
0
gravatar for y.hoogstrate
14 months ago by
y.hoogstrate460
Netherlands
y.hoogstrate460 wrote:

Mdup most likely refers to http://broadinstitute.github.io/picard/command-line-overview.html#MarkDuplicates

Maybe you can extract the BAM headers. It could be the picard settings are included.

ADD COMMENTlink written 14 months ago by y.hoogstrate460
1

Agree (that tool or rmdup are potentials).

To extract the headers within Galaxy, use the tool BAM-to-SAM with the option to just output the header.

ADD REPLYlink written 14 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour