Question: Poor overall alignment rates
0
gravatar for egonz340
4 months ago by
egonz34010
egonz34010 wrote:

I am having problems with the alignment process for some data that I obtained through NCBI GEO database.The accession number for the data that I am using is SRR1103937 and the following two links provide some more information on the sample: About SRR1103937, About the sample. I am using hisat2 in order to complete the alignment. Ideally I would like to be able to run hisat2 on a local server where I have installed hisat2, the SRA Toolkit, and htseq-count but since I ran into the alignment problem I have been using Galaxy. The following are the steps I have taken in order to obtain the poor alignment from hisat2:

  1. NCBI GEO provides the sample in sra format and so I had to use the SRA Toolkit to convert from sra to fastq. On my local server I would complete this action by using the command ./fastq-dump SRR1103937. On Galaxy I would type in the accession number SRR1103937 and then select the gzip compressed fastq option while leaving the Advanced Options unchanged.
  2. After obtaining the fastq file I would use hisat2 for the alignment process. On my local server I would first have to build an index so I would go to the hisat2 website and obtain the H. sapiens, GRCh38 pre-built index titled genome. After running the command that builds the index I would use the command ./hisat2 -x ./grch38/genome -U SRR1103937.fastq -S SAMPLE1_aligned.sam to produce the aligned sam file. On Galaxy the settings I used were as follows: FASTQ, Individual unpaired reads, SRR1103937, hg38, and all other options set to default.

Both methods produced the same overall alignment rate which was 12.59%. How can I improve the overall alignment rate?

rna-seq alignment galaxy • 170 views
ADD COMMENTlink modified 4 months ago by Jennifer Hillman Jackson23k • written 4 months ago by egonz34010
1

It's smallRNA-seq, so make sure you trim the reads.

ADD REPLYlink written 4 months ago by Devon Ryan1.8k

Okay I will do this thank you.

ADD REPLYlink modified 4 months ago • written 4 months ago by egonz34010
0
gravatar for Jennifer Hillman Jackson
4 months ago by
United States
Jennifer Hillman Jackson23k wrote:

Hello,

Devon is correct, some QA may help. You could also compare mapping rates against those from other alignment tools such as BWA or Bowtie2 with such short sequences.

Or use alternative tools available in the Tool Shed (search with "miRNA") in your own local, docker, or cloud Galaxy and/or try another public Galaxy server (please see all Galaxy Choices here). In particular, these two public Galaxy servers have domain specific tools installed for microRNA analysis:

Others have noted low mapping rates with GEO data in general. There are many posts/papers about this, here is one example: https://www.biostars.org/p/198254/

Thanks, Jen, Galaxy team

ADD COMMENTlink written 4 months ago by Jennifer Hillman Jackson23k
1

Thank you I will look into the information that you have provided me with.

ADD REPLYlink written 4 months ago by egonz34010
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 91 users visited in the last hour