Question: HISAT seems to assign wrong XS tag value to some reads of stranded RNAseq data
1
gravatar for matthew.nicotra
24 days ago by
matthew.nicotra10 wrote:

Hi All,

I have some paired end stranded RNAseq data prepared using an Illumina TruSeq Stranded mRNA kit. I have mapped the reads to the genome using HISAT.

Based on my understanding of the SAM flags, concordant reads should be assigned XS values as follows:

If gene is on the forward strand of the genome:

read1 maps to reverse strand of genome, SAM flag = 83, XS:A:+
read2 maps to forward strand of genome, SAM flag = 163, XS:A:+

If gene is on the reverse strand of the genome:

read1 maps to forward strand of genome, SAM flag = 99, XS:A:-
read2 maps to reverse strand of genome, SAM flag = 147, XS:A:-

For some reason HISAT is assigning reads with a SAM flag = 147 an XS value of '+'

We've mapped the same data to the same genome with TopHat and the XS values are correct.

Is this a bug in HISAT?

Thanks,

Matt Nicotra

University of Pittsburgh

hisat software error rna-seq • 114 views
ADD COMMENTlink modified 23 days ago • written 24 days ago by matthew.nicotra10

Upon further investigation, it seems like HISAT2 is being run with the --rna-strandness parameter set to R and not RF, despite the fact that we selected paired end reads.

Here's the information from Galaxy for the HISAT run:

HISAT2
Dataset Information
Number: 1

Name:          HISAT2 on data 3, data 5, and data 6
Created:    Sat 18 Nov 2017 04:36:07 PM (UTC)
Filesize:   7.2 GB
Dbkey:  ?
Format: bam
Job Information
Galaxy Tool ID: toolshed.g2.bx.psu.edu/repos/iuc/hisat2/hisat2/2.0.5.2
Galaxy Tool Version:    2.0.5.2
Tool Version:   /galaxy/main/deps/_conda/envs/mulled-v1-2bb67013a57cac1e35f407d06d1f347baae35159f498496f1e36f84784069212/bin/hisat2-align-s version 2.0.5 64-bit Built on login-node03 Fri Nov 4 10:42:22 EDT 2016 Compiler: gcc version 4.8.2 (GCC) Options: -O3 -m64 -msse2 -funroll-loops -g3 -DPOPCNT_CAPABILITY Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}
Tool Standard Output:   stdout
Tool Standard Error:    stderr
Tool Exit Code: 0
History Content API ID: bbd44e69cb8906b5bd88044d7587709b
Job API ID: bbd44e69cb8906b53c323905b517dac3
History API ID: 64804927f4da22d5
UUID:   20dbfc90-3d3a-44b9-a109-5c2c173c1ba3
Tool Parameters
Input Parameter Value   Note for rerun
Input data format   fastq   
Single end or paired reads? paired  
Input is structured as  files   
Forward reads   6: INTERLEAVEDREADS.ECR.FQ.paired.A.fastq.gz    
Reverse reads   5: INTERLEAVEDREADS.ECR.FQ.paired.B.fastq.gz    
Paired-end options  advanced    
Disable alignments of individual mates  True    
Disable alignments of individual mates  True    
Skip reference strand of reference  False   
Write unaligned reads (in fastq format) to separate file(s) False   
Write aligned reads (in fastq format) to separate file(s)   False   
Source for the reference genome to align against    history 
Select the reference genome 3: primary.polished.fasta   
Primary alignments  1   
Maximum number of seeds that will be extended   Not available.  
Report secondary alignments False   
Alignment options   defaults    
Input options   defaults    
Scoring options advanced    
Function governing the minimum alignment score needed for an alignment to be considered "valid" (i.e. good enough to report)    Constant [f(x) = B] 
Constant term (B)   0.0 
Coefficient (A) -0.2    
Set match bonus 2   
Maximum mismatch penalty    6   
Minimum mismatch penalty    2   
Disallow soft-clipping  False   
Ambiguous read penalty  1   
Maximum soft-clipping penalty   2   
Minimum soft-clipping penalty   1   
Read gap open penalty   5   
Read gap extend penalty 3   
Reference gap open penalty  5   
Reference gap extend penalty    3   
Spliced alignment parameters    advanced    
Penalty for canonical splice sites  0   
Penalty for non-canonical splice sites  12  
Penalty function for long introns with canonical splice sites   Natural logarithm [f(x) = B + A * log(x)]   
Constant term (B)   -8.0    
Coefficient (A) 1.0 
Penalty function for long introns with non-canonical splice sites   Natural logarithm [f(x) = B + A * log(x)]   
Constant term (B)   -8.0    
Coefficient (A) 1.0 
Minimum intron length   20  
Maximum intron length   500000  
Specify strand-specific information First Strand (R/RF) 
Disable spliced alignment       
GTF file with known splice sites        
Transcriptome assembly reporting    Report alignments tailored specifically for Cufflinks.  
Paired alignment parameters defaults    
Job Resource Parameters no  
Inheritance Chain
HISAT2 on data 3, data 5, and data 6
↑
'HISAT2 on data 3, data 5, and data 6' in Wt-ARC_mapping
Job Dependencies
Dependency  Dependency Type Version
hisat2  conda   2.0.5
samtools    conda   1.4

But here's the information from the Header of the SAM/BAM file:

    @PG ID:hisat2 PN:hisat2 VN:2.0.5 CL:"/galaxy/main/deps/_conda/envs/mulled-v1-2bb67013a57cac1e35f407d06d1f347baae35159f498496f1e36f84784069212/bin/hisat2-align-s 
--wrapper basic-0 -p 6 -x genome --no-mixed --no-discordant -k 1 --ma 2 --mp 6,2 --np 1 --rdg 5,3 --rfg 5,3 --sp 2,1 
--score-min C,0.0,-0.2 --pen-cansplice 0 --pen-noncansplice 12 --pen-canintronlen G,-8.0,1.0 
--pen-noncanintronlen G,-8.0,1.0 --min-intronlen 20 --max-intronlen 500000 --dta-cufflinks --rna-strandness R -1 /tmp/26433.inpipe1 -2 /tmp/26433.inpipe2"

Shouldn't --rna-strandness be RF and not R?

ADD REPLYlink modified 23 days ago • written 23 days ago by matthew.nicotra10
0
gravatar for Jennifer Hillman Jackson
24 days ago by
United States
Jennifer Hillman Jackson23k wrote:

Hello,

Try setting the HISAT2 option to report alignments for Cufflinks. The option is explained in this tutorial: https://galaxyproject.org/tutorials/nt_rnaseq/#spliced-mapping-with-hisat

The only difference is that you will be selecting the Cufflinks output option instead of Stringtie.

Thanks, Jen, Galaxy team

ADD COMMENTlink written 24 days ago by Jennifer Hillman Jackson23k
0
gravatar for matthew.nicotra
24 days ago by
matthew.nicotra10 wrote:

Hi Jen,

We still seem to be having trouble. We've tried setting the HISAT2 option to report alignments for Cufflinks, Stringtie, and "default". In all cases the reads with a 147 get an XS:A:+ instead of XS:A:-

Is there another setting we are missing?

Thanks,

Matt

ADD COMMENTlink written 24 days ago by matthew.nicotra10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 101 users visited in the last hour