Question: cuffdiff error and time
0
gravatar for 1603.neha
2.6 years ago by
1603.neha70
1603.neha70 wrote:

hi i have followed pipeline grooming the sra data followed by tophat and cufflink.cuffdiff job is taking too long.i want to know that is their some error in my input to cuffdiff.what i should check in cufflink result.how to see the result of tophat is satisfactory or not

rnaseq analyse output cuffdiff • 894 views
ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by 1603.neha70
1
gravatar for yena.oh
2.6 years ago by
yena.oh70
Canada
yena.oh70 wrote:

Hi,

To validate if your tophat worked(i.e.the reads aligned correctly to the reference genome), you can try:

  1. Visualizing regions of interest in the genome. This can be done using Tracker or browsers such as Integrated Genome Browser(IGB) or Integrative Genomics Viewer(IGV) by expanding your tophat "accepted_hits" files, and clicking on either: "display with IGV" or "display in IGB View."

  2. You can also check your mapping statistics by accessing the "align_summary" file.

Yena

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by yena.oh70
1
gravatar for yena.oh
2.6 years ago by
yena.oh70
Canada
yena.oh70 wrote:

This means that tophat did not work. Only 1226 reads out of 19118751 reads were aligned to the reference genome, hence 0.0% of input.

Double check if you had provided the correct reference genome. Did you run a quality check on your reads (i.e. fastQC)? This will allow you to see if the reads are of good quality, with which you can decide whether you need to manipulate the reads or filter out poor quality reads. Every part of the fastQC results is described in the link provided:

http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Module

Yena

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by yena.oh70
0
gravatar for 1603.neha
2.6 years ago by
1603.neha70
1603.neha70 wrote:

hi thanks alot my result says Reads: Input : 19118751 Mapped : 1226 ( 0.0% of input) of these: 518 (42.3%) have multiple alignments (427 have >20) 0.0% overall read mapping rate.

what does it mean. thanks

ADD COMMENTlink written 2.6 years ago by 1603.neha70

Something is probably wrong with the input to map this poorly. Also double check:

  1. Data represents spliced reads

  2. Fastq inputs are the true pairs and entered on tool form in forward/reverse read order

  3. Target reference genome mapped against is the right one. If custom genome, double check formatting: https://wiki.galaxyproject.org/Support#Custom_reference_genome

  4. Fastq format has quality scores scaled correctly as fastqsanger. Here is how to check: https://wiki.galaxyproject.org/Support#FASTQ_Datatype_QA

  5. QA was not overly zealous resulting in lost reads/sequence content. Can try mapping original and compare if you clipped. Then adjust.

  6. "Minimum length of read segments" (full parameters) is one half the length of the shortest sequence mapped (or that is expected to map).

One of these reasons is behind most poor read mapping results (from a usage perspective). Content/sequencing errors are upstream. Run FastQC to get a bead on overall read quality. QA might fix this or you may need to check in with the lab that did the sequencing.

Good luck! Jen, Galaxy team

ADD REPLYlink written 2.6 years ago by Jennifer Hillman Jackson25k
0
gravatar for 1603.neha
2.6 years ago by
1603.neha70
1603.neha70 wrote:

mam 1. the link u have provided is not working . 2. fastq input is single end reads. 3. reference genome is human genome hg38.

thanx

ADD COMMENTlink written 2.6 years ago by 1603.neha70
1

Sorry about that. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/

ADD REPLYlink written 2.6 years ago by yena.oh70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 175 users visited in the last hour