Hi all,
I have been trying to initiate a protocol to call SNPs in RNA-Seq data, but have had a few problems. I have carried out the main steps below:
Import known SNPs from hg19 and RNA-Seq data
Convert to FASTQ
FastQC on RNA-Seq data
Convert to FastQSanger
Split paired end reads into forward and reverse reads
CollectInsertSizeMetrics (Mean insert size = 149.53 SD = 37.691812)
FASTQ to FASTA on forward and reverse reads
Compute sequence length on forward and reverse reads
Summary Statistics on forward and reverse reads (Mean = 37.6992, SD = 1.79589)
TopHat on forward and reverse reads (Mean inner distance = 300-(38+38) SD = 38)
MPileup on TopHat data
Varscan on MPileup data
Sort into chromosomal order
Filter for all mutations from an 'A' to 'G'
However, when I carry out VarScan it only returns 39 reads, far too few for me to be expecting from a 4.5GB file of over 128 million reads. Can anybody see if I've gone obviously wrong somewhere in my protocol?
Many thanks,
Frankie.