Poor overall alignment rates

Question: Poor overall alignment rates

16 months ago by

egonz340 • 10 wrote:

I am having problems with the alignment process for some data that I obtained through NCBI GEO database.The accession number for the data that I am using is SRR1103937 and the following two links provide some more information on the sample: About SRR1103937, About the sample. I am using hisat2 in order to complete the alignment. Ideally I would like to be able to run hisat2 on a local server where I have installed hisat2, the SRA Toolkit, and htseq-count but since I ran into the alignment problem I have been using Galaxy. The following are the steps I have taken in order to obtain the poor alignment from hisat2:

NCBI GEO provides the sample in sra format and so I had to use the SRA Toolkit to convert from sra to fastq. On my local server I would complete this action by using the command ./fastq-dump SRR1103937. On Galaxy I would type in the accession number SRR1103937 and then select the gzip compressed fastq option while leaving the Advanced Options unchanged.
After obtaining the fastq file I would use hisat2 for the alignment process. On my local server I would first have to build an index so I would go to the hisat2 website and obtain the H. sapiens, GRCh38 pre-built index titled genome. After running the command that builds the index I would use the command ./hisat2 -x ./grch38/genome -U SRR1103937.fastq -S SAMPLE1_aligned.sam to produce the aligned sam file. On Galaxy the settings I used were as follows: FASTQ, Individual unpaired reads, SRR1103937, hg38, and all other options set to default.

Both methods produced the same overall alignment rate which was 12.59%. How can I improve the overall alignment rate?

rna-seq alignment galaxy • 757 views

ADD COMMENT • link •

modified 16 months ago by Jennifer Hillman Jackson ♦ 25k • written 16 months ago by egonz340 • 10

It's smallRNA-seq, so make sure you trim the reads.

ADD REPLY • link written 16 months ago by Devon Ryan • 1.9k

Okay I will do this thank you.

ADD REPLY • link modified 16 months ago • written 16 months ago by egonz340 • 10

16 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

Devon is correct, some QA may help. You could also compare mapping rates against those from other alignment tools such as BWA or Bowtie2 with such short sequences.

Or use alternative tools available in the Tool Shed (search with "miRNA") in your own local, docker, or cloud Galaxy and/or try another public Galaxy server (please see all Galaxy Choices here). In particular, these two public Galaxy servers have domain specific tools installed for microRNA analysis:

Others have noted low mapping rates with GEO data in general. There are many posts/papers about this, here is one example: https://www.biostars.org/p/198254/

Thanks, Jen, Galaxy team

ADD COMMENT • link written 16 months ago by Jennifer Hillman Jackson ♦ 25k

Thank you I will look into the information that you have provided me with.

ADD REPLY • link written 16 months ago by egonz340 • 10

Similar posts • Search »