Help with interpreting RNA-seq output

Question: Help with interpreting RNA-seq output

4.4 years ago by

bwb012 • 20

United States

bwb012 • 20 wrote:

I am new to galaxy and have been trying to figure out how to align RNA-seq reads to a genome in order to observe differential gene expression. I have been messing around with Drosophila melanogaster experimental reads from the EBI SRA site.

Side Question: Most experiments are labeled as paired-end reads, however when I go to download the data there is usually only one file (some do have two). Does the one file mean that both ends are in the same file, and if so in what format? Aligned next to each other in one long read? Stacked on top of each other?

So I download the file (I have tried this with several different experiment paired-end reads from the EBI SRA site), do FASTQ Groomer, and then do tophat2 or bowtie2 on the reads, using the built in Drosophila melanogaster genome (d3) from the galaxy website. When I get my output I then do a cufflinks, which produces tables indicating the loci positions of genes the reads mapped to. What I want is to transform those loci positions into actual gene names. What is the best way to do this? Is there another (or better) galaxy tool?

When I have tried to look at differential gene expression - specifically between two files: one being reads of healthy Drosophila larvae and one being reads of fungus infected Drosophila larvae, the loci positions produced by cufflinks do not match up between both files. To clarify, when I take all the loci positions from the healthy larvae and compare them to all the loci positions from the infected larvae, there ends up only being around 8 exact matches. This would imply that out of all the genes expressed between the two groups of larvae, only 8 are exactly the same - which cannot be correct. Is there some error in my methods that are causing these incorrect results? Or perhaps I am just not interpreting the output correctly: is it normal to get a bunch of fragmented loci positions that all code for the same gene, and I would have to go in and manually identify which locus was for which gene?

Thank you in advance for assistance with this issue. Any help with any of these questions would be greatly appreciated.

sequencing tophat2 rna cufflinks bowtie2 • 1.8k views

ADD COMMENT • link •

modified 4.4 years ago by tom.bair • 10 • written 4.4 years ago by bwb012 • 20

Similar posts • Search »