Question: Help with interpreting RNA-seq output
0
gravatar for bwb012
3.9 years ago by
bwb01220
United States
bwb01220 wrote:

I am new to galaxy and have been trying to figure out how to align RNA-seq reads to a genome in order to observe differential gene expression. I have been messing around with Drosophila melanogaster experimental reads from the EBI SRA site.

Side Question: Most experiments are labeled as paired-end reads, however when I go to download the data there is usually only one file (some do have two). Does the one file mean that both ends are in the same file, and if so in what format? Aligned next to each other in one long read? Stacked on top of each other?

So I download the file (I have tried this with several different experiment paired-end reads from the EBI SRA site), do FASTQ Groomer, and then do tophat2 or bowtie2 on the reads, using the built in Drosophila melanogaster genome (d3) from the galaxy website. When I get my output I then do a cufflinks, which produces tables indicating the loci positions of genes the reads mapped to. What I want is to transform those loci positions into actual gene names. What is the best way to do this? Is there another (or better) galaxy tool? 

When I have tried to look at differential gene expression - specifically between two files: one being reads of healthy Drosophila larvae and one being reads of fungus infected Drosophila larvae, the loci positions produced by cufflinks do not match up between both files. To clarify, when I take all the loci positions from the healthy larvae and compare them to all the loci positions from the infected larvae, there ends up only being around 8 exact matches. This would imply that out of all the genes expressed between the two groups of larvae, only 8 are exactly the same - which cannot be correct. Is there some error in my methods that are causing these incorrect results? Or perhaps I am just not interpreting the output correctly: is it normal to get a bunch of fragmented loci positions that all code for the same gene, and I would have to go in and manually identify which locus was for which gene?

Thank you in advance for assistance with this issue. Any help with any of these questions would be greatly appreciated. 

ADD COMMENTlink modified 3.9 years ago by tom.bair10 • written 3.9 years ago by bwb01220
0
gravatar for tom.bair
3.9 years ago by
tom.bair10
United States
tom.bair10 wrote:

Side Question: Any are possible I guess but with only 1 file I would guess subsequent entries in the fastq, you may want to filter into two files which is more typical

 

1. tophat would be more appropriate since int accounts for intronic regions, bowtie will not span introns

2. if you specify a gff or gtf transcript  file in cuffdiff it will give you gene names

3. This will also be cleared up by using a gtf file I think

 

 

ADD COMMENTlink written 3.9 years ago by tom.bair10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 136 users visited in the last hour