RNA-seq analysis for a small number of genes

Question: RNA-seq analysis for a small number of genes

3.2 years ago by

kr81 • 10

Australia

kr81 • 10 wrote:

Hi all,

I am new to RNA seq analysis and want to look at the expression of a small number of genes (8) in some publicly available RNA seq datasets.

I came up with a simple method that avoids me having to align the RNA seq reads to the genome and do a full tophat/cufflinks analysis (or similar). Briefly, what I did was: Download and QC filter SRA dataset > map the reads to a multi-fasta file containing the exonic sequences for my genes of interest with bowtie2 > filter out hits with MAPQ<30 > obtain FPKM values using eXpress.

Does anyone have any thoughts on whether my method is valid or not? My main concern is that mapping reads to such a low complexity reference may artificially inflate the number of reads that map to my genes of interest and bias the FPKM values?

Thanks in advance :)

rna-seq bowtie2 fpkm express • 1.0k views

ADD COMMENT • link •

modified 3.2 years ago by Jennifer Hillman Jackson ♦ 25k • written 3.2 years ago by kr81 • 10

3.2 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

The statistics will be different from what would be obtained by performing a genome-wide analysis, but the relative values within the experiment should still be comparable (and informative). Just keep that in mind.

A slightly better approach would be to consider mapping to the entire genome (a "canonical" version might be a good choice - to eliminate all the noise from unmapped, haplotypes, and other fragments), but then restricting analysis at the stage where a reference annotation dataset is introduced. Cuffdiff will only report results for the transcripts (representing genes) that it is informed about through this annotation. You do not need to include the entire transcriptome (or exome for that matter) if the background is not needed. You can also restrict by chromosome or region. And you do not need to do discovery, which means skipping Cufflinks, Cuffmerge and just moving from Tophat to Cuffdiff using a reference annotation GTF/GFF3 dataset of the those genes of interest.

Short answer, the some values will be skewed when inputs are limited deliberately, but it may not matter for certain types of analysis.

Best, Jen, Galaxy team

ADD COMMENT • link written 3.2 years ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »