I am new in bioinformatics and in Galaxy too so please bear with me
Soon I will have RNA seq samples (single reads) for wild type and mutant, and my target is to get expression levels between them.
This is my pipeline :-
Groomer → Filtration (clipping, filtration quality)→ Alignment (Tophat)
Removing of Duplicates (using Picard:MarkDuplicates)
Annotation with the result (Cufflinks) → merging WT& Mutant samples (Cuffmerge) → differential expression (Cuffdiff).
I have taken some samples (H1hsec Rep1, H1hsec Rep2 and Cd20 Rep1, Cd20 Rep1) from Data Libraries to test them as my pipeline
I have few questions about the duplicates - although there is no tutorial showing how to remove duplicates using Picard:MarkDuplicates.
1- Does it required
2- while testing I regularly check every result with fastqc- the percentage of sequences remaining if deduplicated was 19.14% after filtration, while after using picard the percentage of sequences remaining if deduplicated was 3.33% does this mean- it is good
3- Does normalization of the reads automatically happened during application of cuffdiff
Thank you in advance