I have been running the pipeline below to try and call SNPs from RNA-Seq data but have encountered problems with Realigner Target Creator Tool in Galaxy. Can anyone see any obvious problems in the pipeline?
Import ucsc.hg19.fasta, ucsc.hg19.dict, ucsc.hg19.fasta.fai, ucsc hg19 snps, 1000G indels and RNA-Seq data.
Convert RNA-Seq data into BED
Convert RNA-Seq data into FASTQ
FastQC on RNA-Seq data
FASTQ Groomer on RNA-Seq data
FASTQ Splitter into forward and reverse reads (RNA-Seq data originally paired end)
Map with BWA for Illumina on forward and reverse reads
IdxStats on BWA output
Sort by chromosomal coordinate
RmDup on RNA-Seq data
Filter on RNA-Seq data for mapped reads and reads in proper pairs
ValidateSamFile to check for errors (no read groups assigned)
AddOrReplaceReadGroups on RNA-Seq data
ReorderSam to remove lexicographical sort
Filter for chromosome 1 to narrow down data size
ValidateSamFile to check for further errors (nucleotide difference in file does not match reality and mate not found for paired reads given)
I have tried to run Realigner target creator as a prerequisite for the Indel Realigner however it will not work, bringing up the error "Lexicographically sorted human genome sequence detected in reads". I would have thought this problem had already been solved by running the ReorderSam step? Could this be that my reference genome is the problem somewhere? When running the Realigner target creator I can only use the imported fasta hg19 file, as it does not bring up any locally cached references?
Thanks, Frankie.