I am having problems with the alignment process for some data that I obtained through NCBI GEO database.The accession number for the data that I am using is SRR1103937 and the following two links provide some more information on the sample: About SRR1103937, About the sample. I am using hisat2 in order to complete the alignment. Ideally I would like to be able to run hisat2 on a local server where I have installed hisat2, the SRA Toolkit, and htseq-count but since I ran into the alignment problem I have been using Galaxy. The following are the steps I have taken in order to obtain the poor alignment from hisat2:
- NCBI GEO provides the sample in sra format and so I had to use the SRA Toolkit to convert from sra to fastq. On my local server I would complete this action by using the command ./fastq-dump SRR1103937. On Galaxy I would type in the accession number SRR1103937 and then select the gzip compressed fastq option while leaving the Advanced Options unchanged.
- After obtaining the fastq file I would use hisat2 for the alignment process. On my local server I would first have to build an index so I would go to the hisat2 website and obtain the H. sapiens, GRCh38 pre-built index titled genome. After running the command that builds the index I would use the command ./hisat2 -x ./grch38/genome -U SRR1103937.fastq -S SAMPLE1_aligned.sam to produce the aligned sam file. On Galaxy the settings I used were as follows: FASTQ, Individual unpaired reads, SRR1103937, hg38, and all other options set to default.
Both methods produced the same overall alignment rate which was 12.59%. How can I improve the overall alignment rate?