When estimating transcript levels from RNA Seq data that have been normalized using rpkm or fpkm, would the slice SAMtool (which manually sets a list of coordinates and extracts the data say for the few genes selected out of the whole transcriptome) differ from the results obtained from the (unsliced) whole transcriptome based RNA-Seq rpkm/fpkm?
Hello,
Yes. As far as I know, it would be better to use the complete hit distribution for the input BAM/SAM datasets when performing the differential expression testing. After that is done, filter results for just those target genes of interest. Using subsets of data is fine sometimes, but those are special use-cases.
To see if this is true for your data, you could run both data though and then compare to find out exactly how different these results are. The DE calculations for R/FPKM are all relative to the specific tool run/inputs and so cannot be directly compared to different runs/different inputs, but you could compare overall counts for significant over/under expression, the genes and other features associated with those, and possibly a few target gene/transcript regions in a visualization that contains both results (in a genome browser, etc).
If I have misunderstood your question, please explain what you are doing step-by-step to clarify.
Galaxy tutorials: https://galaxyproject.org/learn/
Thanks! Jen, Galaxy team