7 days ago by
United States
Hello,
It is true that a reference genome can contain variation. It is difficult to define what is the "most common allele" (snp), especially for organisms that may have a different "most common allele" for different sub-populations. This is also true for more complex genomic sequence rearrangements (mnps, indels). An assembled genome is a standardize base-line reference sequence that data can be compared against.
What is important for most analysis, including differential expression analysis, is to compare all of your data to the same exact reference genome (or transcriptome/exome). That way expression differences between samples/conditions that are present in your data are all based on the same underlying sequence, the "reference" sequence. Data mapped to the same reference can be compared to each other and to other features mapped to that same reference.
It may help to review some of the Galaxy tutorials to get a better understanding of how reference sequence (and annotation) are used in an analysis. Most tutorials include links to external publications and help guides, along with summary descriptions about what is going on scientifically behind the technical steps performed during an analysis.
Thanks! Jen, Galaxy team