Hi all,
I successfully ran my RNA-seq data through the entire tuxedo suite on galaxy. I would like to analyze my data using three different gene enrichment analysis programs. Does anyone have any experience with this?
Here are a couple things I'd like to know:
1) What columns need to be extracted from the cuffdiff output?
2) What is the best way to transfer my list of differentially expressed genes from galaxy into a gene enrichment analysis tool?
Thank you in advance!
Hello,
The contents and column values for each Cuffdiff output file is described here in the manual. You most likely want to use the "Differential NNN" files, although the tracking files will be useful, too:
http://cufflinks.cbcb.umd.edu/manual.html#cuffdiff_output
Which fields to pick depends on what sort of input you submitted to Cuffdiff (reference gtf/gff3 with transcript identifiers? with gene symbols?) and what the downstream tool requires (coordinates or a known transcript/gene name?). Did you choose to do discovery or limit to known transcripts (meaning, will all of your results have a name that is "known" and linked to GO or disease terms or whatever you choose to summarize on). Once you know what data you have, see the tools in the group "Phenotype Association" for example tools and review the format of input they require. The Tool Shed and Public Servers may have others of interest:
https://wiki.galaxyproject.org/PublicGalaxyServers
http://toolshed.g2.bx.psu.edu
Use tools such as "Cut", "Filter", and "Group" to manipulate data. Click on the pencil icon to adjust attributes. Tabular output is usually best, although some tool may require a more specific type of tabular data, such as bed or interval format.
Tools will note the expected input format (in Galaxy or not). This wiki covers many common formats:
https://wiki.galaxyproject.org/Learn/Datatypes
Best, Jen, Galaxy team