Question: What to do when alignment rate is low even though the genomic data and RNA-seq data are of same stain
0
gravatar for mayankg.it.bhu
2.6 years ago by
New Delhi, India, ICGEB / JNU / IIT (BHU)
mayankg.it.bhu0 wrote:

Hello

While doing RNA-seq analysis, when I mapped reads for each condition to the reference genome (of same stain of Geobacillus sp.) with TopHat I get quite low percentage (lower then 60 % in each condition) of overall mapped alignment rate, for example, in following alignment summary I am not able to understand why the alignment rate is low even though the genomic data and RNA-seq data are from same stain. Can anyone please help me to interpret from the following alignment summary? Is something wrong with RNA-seq data?

Even size of mapped bam files are 6G (size on drive) and Unmapped bam files are less than 100M.

Left reads:

      Input     :  13923415
       Mapped   :   7248369 (52.1% of input)
        of these:   6893771 (95.1%) have multiple alignments (306448 have >20)

Right reads:

      Input     :  13923415
       Mapped   :   7103432 (51.0% of input)
        of these:   6748616 (95.0%) have multiple alignments (306338 have >20)

51.5% overall read mapping rate.

Aligned pairs: 5267947

 of these:   4923439 (93.5%) have multiple alignments
               29026 ( 0.6%) are discordant alignments

37.6% concordant pair alignment rate

Best Regards

Mayank

ADD COMMENTlink modified 2.6 years ago by Jennifer Hillman Jackson25k • written 2.6 years ago by mayankg.it.bhu0

hi mayank ..could you improve your mapping perentage. ia m also stuck with the sam e. have tried trimming , clipping etc. but n o lucky yet.

ADD REPLYlink written 24 months ago by computationalvarun20
0
gravatar for Jennifer Hillman Jackson
2.6 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hi Mayank,

Thanks for posting the question to Biostars.

I've seen this occur for a few reasons, some of which can be mitigated:

  1. The input fastq sequence has incorrect quality score scaling. In most cases like this, fastqillumina was assigned to fastqsanger (directly or by using the wrong Fastq Groomer options). Double check your data with this method: https://wiki.galaxyproject.org/Support#FASTQ_Datatype_QA

  2. The input sequence length is not twice as long as the setting for "Minimum length of read segments". This option is found on the tool form under TopHat settings to use -> Full parameter list.

  3. Too much trimming or other QA. Often RNA-seq data can be mapped successfully with very little manipulation (as is the case with most expression data of any type, in my opinion - but other's opinion may differ!). If you did QA, consider relaxing the parameters to preserve more of the sequence or try a test run with very litte or no trimming.

  4. Mixed up samples or mixed up forward/reverse reads entered on the tool form. It happens - use the re-run button to double check what was entered (assuming the samples are labeled correctly in Galaxy - going upstream to confirm this might be needed).

  5. Finally, there could be an inherent data problem. In library prep or downstream in sequencing. This is the last thing to check after the informtics is confirmed to be good above. Tophat settings can be adjusted sometimes to help improve overall mapping and concordant pairs. It could be worth reviewing the manual for how the parameters interact and run a few tests to see if the results can be improved.

Best, Jen, Galaxy team

ADD COMMENTlink written 2.6 years ago by Jennifer Hillman Jackson25k

Hey Jennifer,

I have Illumina HiSeq1000 Sequencing data (Paired-End RNA-seq data, No. Of Cycles : 2 X 100), I have not perform trimming on the Reads but Quality Control reports show high fluctuation upto 10 bps in per base sequence content graph1, sequence length is of 100 bps, and in sequence duplication levels there are some abnormalities graph2, are these two going to affect alignment rate?

I am sure that there is no mixed up.

Best regards, Mayank

graph1 http://postimg.org/image/jleuyit8x/

graph2 http://postimg.org/image/iifrwquq9/

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by mayankg.it.bhu0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 168 users visited in the last hour