hi i am following the basic rna seq pipeline. on my sra ids i have done qc and grooming. i did alignment with tophat and subjected to cufflink for assembly. i wnat to know tha t gene ids and transcript id provided ny cufflink in the output what does those mean?? my refernce genome is human genome hg38. also everything that is .all genes and transcripts it is giving on chr1.how is it possible.?unable to interpret the result. please help.
Hello,
There is a lot going on here, so let's break it down:
For Cufflink gene/transcript identifiers, these can come from the use of reference annotation, or be generated automatically, or often some combination of the two (depending on other settings). The manual can help (scroll and click into Cufflinks, also Cuffdiff will have useful info): http://cole-trapnell-lab.github.io/cufflinks/manual/
If all the hits in Tophat and then subsequent gene/transcripts are mapping there, two things can be going on.
One: The sequences only map to that single chromosome. Could be based on how the library/sequence prep was done (targeting a specific region) OR the data could have been pre-filtered for this content (especially possible if the source fastq was from a tutorial).
Two: The ability to filter results based on supplied annotation is possible within both Tophat and Cufflinks. The manual will explain more, but if this was done, it could explain the limited output.
Good luck and hope this helps! Jen, Galaxy team