Question: Tophat Mapping
0
gravatar for 杨继文
6.2 years ago by
杨继文210
杨继文210 wrote:
Hi, After mapping RNA-Seq paired end reads with Tophat, I can see that most of reads fall into the right regions. However, I still can see lots of reads mapped to non-coding region (the locations where the reads are mapped to don't contain exons). I am wondering if these "non-coding reads" will be included when cufflinks calculates transcript/gene expression. Dying to know your opinion. And another question is: how to know the number of reads mapped to a certain exon? Thanks
rna-seq cufflinks • 1.2k views
ADD COMMENTlink modified 6.2 years ago by Jeremy Goecks2.2k • written 6.2 years ago by 杨继文210
0
gravatar for Jeremy Goecks
6.2 years ago by
Jeremy Goecks2.2k
Jeremy Goecks2.2k wrote:
Reads will only be included if they map to assembled/known transcripts. This isn't possible because a single read may map to multiple exons and/or transcripts. Cufflinks assigns reads probabilistically when their mapping cannot be uniquely determined. See http://cufflinks.cbcb.umd.edu/faq.html#count http://cufflinks.cbcb.umd.edu/howitworks.html for details. Best, J.
ADD COMMENTlink written 6.2 years ago by Jeremy Goecks2.2k
Dear All I need some (lots) suggestions and help,  first and most important is that i am working on bacterial RNA seq (illumina reads) my analysis steps are as following .... Step 1.  FASTQ sequence data was groomed Step 2. I did mapping by Bowtie with default parameters. Reference genome fasta file i am using from my history, because the reference genome is not vaialble on galaxy. Step3. i sorted the bowtie output file using r workflow (Germy Goecks workflow)  , link below https://main.g2.bx.psu.edu/u/jeremy/p/transcriptome-analysis-faq#faq2 Step4. this sorting provided me Concatenate files Step5. Concatenated files were used to run CUFFLINK, this provided me assembled trancript file Step6: Assembled transcript files from step 5 were used for CUFFMERGE Step 7:  For CUFFDIFF Transcript GTF file generated from  Step 6and  concatenate files from step 4 were used  Now my question is if this workflow is acceptable for bacterial transcriptome anaylsis, Should i filter SAM file, if yes then at which step Should i convert SAM file to the BAM file, then at which step it should be Is that Ok to use fasta of reference genome for mapping should it be converted to any other format, if yes then what should be the workflow If any one have experince of bowtie parametes to map bacterial RNA seqquence analsis are very much welcomed  Thanking you all
ADD REPLYlink written 6.2 years ago by Ateequr Rehman150
Well it depends what transcript annotation file you pass to cuffdiff. If you run cufflinks without using --GTF: "Tells Cufflinks to use the supplied reference annotation (a GFF file) to estimate isoform expression. It will not assemble novel transcripts, and the program will ignore alignments not structurally compatible with any reference transcript."[1] In Galaxy language, option "Use Reference Annotation:" with "Use reference annotation" selected. Then the two other options, "No" or "Use reference annotation as guide", will allow cufflinks to estimate unknown transcripts. If later you use cuffmerge to produce the transcripts annotation from your cufflinks runs and use it for cuffdiff, the "non-coding reads" will almost for sure pollute your transcript expression estimates. [1]http://cufflinks.cbcb.umd.edu/manual.html Jeremy, do you have a workflow to estimate what percent of the reads are mapping to unknown expressed regions? I would like to be able to produce this estimate before I make a decision on which transcripts annotation I should pass to cuffdiff. I would expect a small percent of reads to map outside of known expressed regions, but is this number is to big, then I would like to check for potential problems with my library. Regards, Carlos
ADD REPLYlink written 6.2 years ago by Carlos Borroto390
Here's a simple approach assuming mapped reads are in BAM format: BAM --> SAM SAM --> Interval Intersect reads as interval with known annotation not allowing for any overlap. Best, J.
ADD REPLYlink written 6.2 years ago by Jeremy Goecks2.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 120 users visited in the last hour